Introduction

This document describes the semantic annotation of 121 Dutch verbs: herroepen, heffen, huldigen, haten, herhalen, herinneren, diskwalificeren, harden, herstellen, helpen, haken, herstructureren. Both the distribution of the sense tags as attributed by anonymous annotators and their corrected versions will be presented. Before describing the schema followed by each section, some terminological clarification is in order.

Small glossary

Majority sense
A sense that was assigned to a token by the majority of its annotators (at least 2).
The act of assigning such a sense is called an agreement and the annotators may be called agreeing annotators.
When the annotators did not agree on any given sense, the majority sense is no_agreement.
Alternative (sense)
A sense that was assigned to a token by a minority of its annotators (only 1).
The act of assigning such a sense is called a disagreement or disagreeing annotation and the annotator may be called disagreeing/dissenting annotator.
Full agreement
The case that all 3-4 annotators of a token assign the same sense.
Geen tag
Assignment of a “none of the above” tag. This was classified as cases of wrong_lemma, not_listed, unclear and between based on the annotators’ comments.
Final sense
The sense tag assigned by us to a given token, considering but not fully relying on the majority sense and comments.
Batch
Set of 40 tokens of the same lemma annotated by the same group of 3-4 annotators.
The annotators of the first batch of lemma X don’t need to match those of the first batch of lemma Y, but will normally share the same batches in four or five different verbs. In few cases, one person could annotate two batches of the same lemma.
Normally the Netherlandic sources are in the first batches and the Flemish, in the last ones.
Cue
Context word selected by an annotator as informative/helpful for assigning a sense.
Only cues selected by agreeing annotators as such, and if they also agree with the final sense, will be considered.

Schema of the descriptions

For each lemma, the following information will be discussed: original senses and annotations, final senses after a revised reading of the concordance, the most frequent dependency paths stemming from the target, lists of tokens to look for in the vector space models and whether any tokens must be removed from the concordance and why. The next paragraphs will explain in more detail what to expect in each subsection.

Original senses and annotations

First, the original definitions and examples as given to the annotators will be shown, next to their English translations.

Second, the frequency of the annotations by each annotator will be shown in a barplot, which illustrates both the distribution of the senses across batches and the level of agreement within each batch. Next to this first plot, disagreeing annotations will be shown. One general plot will summarize how many disagreements occurred in each batch, by which annotator and against which majority sense, to assess whether confusion is spread or concentrated on certain annotators.

Further plots per sense (in their respective tabs) will show which sense each annotator assigned to the tokens with a given majority sense, to get a better idea of which senses were more problematic and whether the issue was spread or concentrated on some annotators.

Final senses

After a revised reading of the concordances, disagreements may be solved and sometimes even sense tags reassigned. If one of the original senses turns out to be too infrequent or not as expected, it will be removed, and new tags may be included for tokens that don’t conform to any of the original senses, especially if the annotators reported the issue.

This section will address the final distribution: while majority sense still refers to the tag assigned by the majority of the annotators, final sense is the tag that will be used when modelling the lemma, and it might overlap to a greater or lesser degree with the original distribution.

After reporting modifications to the definitions, if any, three sections follow: “Original versus final sense distribution”, “Reliable cues” and “Most frequent dependency paths”.

Original versus final sense distribution

The first plot and table show which kind of modification was applied to each token annotation:

majority
The majority sense was accepted.
correct
The majority sense was not accepted, and another one of the original tags was assigned instead.
new
A new sense tag, not contemplated in the original senses, is assigned. It could either constitute a new tag at the same level as the others or be subdued as subsense of one of the original tags.
idiom
An idiomatic expression was identified. It could even work as a new tag at the same level as the others or be subdued to one of the original tags.
remove
The token was removed.

The third plot correlates original majority sense and final sense assignments, splitting between cases with and without full agreement to check if there are more corrections in the latter than in the former.

Reliable cues

A series of tables will show the top ranked cues per sense. The goal is to have an idea of which context words characterize a given sense better, among the final senses, so only annotations where the assigned sense tag matches the final sense will be taken into account. Furthermore, the individual assignments and cue selection are not particularly reliable, so a threshold of two votes per cue was set. That means that we will only count cases where at least two annotators selected a cue as such for a sense that was also the final sense. If they didn’t both choose the same sense and context word, it is ignored.

One technical disclamer is in order: when the annotation procedure started, there was a bug in the section of the annotation tool responsible of recording the cues. If the same wordform occurred more than once in the context of a given token, and one of its instances was selected as cue, only the first instance was recorded, regardless of whether it was correct or not. The bug was identified and the annotators were notified, but not all of them corrected the results.

The first table shows the top 10 lemma-part-of-speech combinations selected as cues per sense. Then, for each of the senses a dedicated table repeats the top 10 lemma-part-of-speech combinations and adds the top 10 relative positions, dependency paths and dependency path lengths (aka steps).

Relative position
The relative position of a context word in relation to the target is expressed as a combination of a letter (L or R) and a number (minimum 1) so that R1 is the first token to the right of the target, L2 is the second token to the left of the target, etc.
Dependency path
The path from each context word to the target along the dependency tree was calculated. In the formula, #T represents the target and-> the direction from head to dependent, followed by the dependency relation and the dependent separated by a colon, like in the dependency module; CW represents the selected cue. E.g.: #T->obj1:CW means that the cue is the direct object of the target; X->[mod:CW,det:#T] would mean that the target is the determiner of an item X of which the cue is a modifier.
A description of the tags can be found here.
The code that drew these paths is rather rough, so some weird patterns might come up (e.g. moet->vc:word->[su:CW,vc:#T] when it should’ve been moet->[su:CW,vc:word->vc:#T]), but they are minority and it should work fine in the dependency module.
NA values indicate that the cue is beyond the sentence, and therefore has no dependency path to the target.
Steps (path length)
The steps required to go from the target to the cue in the dependency tree, e.g.: 1 for #T->obj1:CW, 2 for X->[mod:CW,det:#T]

Most frequent dependency paths

A plot will show the frequency of dependency paths that occur in at least half the tokens of some sense. This does not filter the context words in any way, either by bag-of-words distance, part of speech or dependency links (so useless but frequent paths like #T->punct:CW show up), but it may be filtered by frequency or path length.

Lists

Some tokens or characteristics thereof might be interesting to look at in the vector spaces, but don’t warrant categorizing all the tokens; instead, lists are made. These lists could group attestations of the following phenomena, among others:

Nominalizations
the closest context words and dependency relations will be different, but the lemma still matches the target and can be assigned a sense.
It could make it harder to distinguish between transitive and intransitive constructions.
Garden-path tokens
There is some deceiving context word that could trick the models into grouping the token with the wrong category.
Atypical contexts
The target either occurs in an atypical combination that can still be parsed, or in unexpected contexts such as lists, poetry fragments, etc.
Headlines
Relatively short sentences without punctuation; sometimes the division between sentences is hard to find by the annotators, and/or relevant context words are likely to be found outside (as elaborations of the headline).
Titles
Short independent phrases that are not separated from the rest of the context by sentence delimiters, but for which the external context is rarely helpful. This is a much bigger issue with nouns than with verbs.
Encyclopedic knowledge needed
Cases in which dentifying and recognizing proper names in the context goes a long way into successful disambiguation.

Removed tokens

Some tokens might be excluded from future analysis because of any of the following reasons: it actually belongs to a different lemma, it is a duplicate from another token, or it belongs to a valid category (an original or new sense, an “in between sense”) but it is too infrequent (normally <1%).


HERROEPEN

Original senses and annotations

The tokens of herroepen were annotated with 2 senses in 6 batches; the tags in Table 1 were suggested.

Table 1. Original definitions of ‘herroepen’.
Definitions
herroepen_1
(trans.) m.b.t. wetten, besluiten e.d.: intrekken, niet langer geldig verklaren: een besluit, volmacht, decreet herroepen
(trans.) w.r.t. laws, decisions and such: withdraw, declare not valid anymore: annul a decision, power of attorney, decree
herroepen_2
(trans.) m.b.t. uitspraken, meningen e.d.: terugnemen en rechtzetten: Trump moest weer een van zijn dwaze tweets herroepen
(trans.) w.r.t. statements, opinions and such: retract and correct: Trump had to retract one of his crazy tweets again

Figure 1 shows the sense distribution by annotator and batch and Figure 2, that of the disagreements. Figure 3 shows the sense tags that each annotator of each batch assigned to the tokens with herroepen_1 as majority sense and Figure 4 those for herroepen_2.

General distribution

Both senses seem roughly equally frequent in the first three batches, while herroepen_1 is much more frequent in the other three. There is not too much disagreement between annotators of the same batch and only three tokens without any agreement at all, which could be assigned a sense.

Figure 1. Distribution of senses of 'herroepen' per annotator and batch.

Figure 1. Distribution of senses of ‘herroepen’ per annotator and batch.

Figure 2. Distribution of disagreeing annotations of 'herroepen' per annotator and batch.

Figure 2. Distribution of disagreeing annotations of ‘herroepen’ per annotator and batch.

Disagreement in herroepen_1

There are few disagreements regarding herroepen_1, most of them concentrated on two particular annotators.

Figure 3. Sense annotations of tokens with 'herroepen_1' as majority sense.

Figure 3. Sense annotations of tokens with ‘herroepen_1’ as majority sense.

Disagreement in herroepen_2

The second sense, herroepen_2, is much less frequent in the last batches, particularly in batch 6, and seems to have a bit more disagreement, mostly concentrated on a couple of specific annotators.

Figure 4. Sense annotations of tokens with 'herroepen_2' as majority sense.

Figure 4. Sense annotations of tokens with ‘herroepen_2’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified.

Original versus final sense distribution

Of the 240 tokens of herroepen, 228 kept their original majority senses, 8 were corrected to another original sense, and 4 were removed.

Table 2 shows in how many tokens with each majority sense which actions were taken, and Figure 5 illustrates the frequency of the final tags. Figure 6 correlates the original majority sense and the final senses.

Figure 5. Final distribution of senses of 'herroepen'.

Figure 5. Final distribution of senses of ‘herroepen’.

Table 2. Cross-tabulation of original majority senses of ‘herroepen’ and actions taken.
original correct majority remove
herroepen_1 2 138 1
herroepen_2 2 90 2
no_agreement 3 0 0
unclear 1 0 0
wrong_lemma 0 0 1
Figure 6. Majority and final senses of 'herroepen'.

Figure 6. Majority and final senses of ‘herroepen’.

Reliable cues

Table 3 shows the most frequent context words selected by the annotators as relevant. Table 4 and Table 5 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags herroepen_1 and herroepen_2.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 20 have no cues that match these criteria. 146 have one single cue and 74 have more than one (up to 10).

Across senses

The two senses have different collocates, three of which stand out by their frequency. The last columns represent a token with wrong_lemma as majority sense, (8) below, in which the annotators tagged a name as cue, presumably as indicating that the target did not belong to any of the other categories.

Table 3. Frequency of cues by sense, counted by type.
Rank herroepen_1 n herroepen_2 n1 remove n2
1 beslissing/noun 30 verklaring/noun 17 Verstraete/name 1
2 besluit/noun 9 uitspraak/noun 15 0
3 veroordeling/noun 5 bekentenis/noun 3 0
4 vonnis/noun 5 zal/verb 3 0
5 rechtbank/noun 4 zeg/verb 3 0
6 wet/noun 4 zijn/det 3 0
7 decreet/noun 3 bewering/noun 2 0
8 schorsing/noun 3 dat/det 2 0
9 word/verb 3 getuige/noun 2 0
10 afspraak/noun 2 het/det 2 0

herroepen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest six slots to the left of the target or up to three to the right, up to 3 or 4 steps away in the dependency path and mainly as direct object (#T->obj1:CW) but also as passive subject (word->[vc:#T,su:CW], word being the verb worden) of the target.

In eight cases, the cues were beyond the sentence: these are three tokens where the theme is not specified within the sentence (“Het werd nooit herroepen.”, “Zoiets kan niet worden herroepen.”, “Dat werd later half herroepen…”2) and one where it was, but some context words beyond the sentence might be considered helpful too.

Table 4. Frequency of context words as cues of herroepen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 beslissing/noun 30 L2 30 #T->obj1:CW 69 1 86
2 besluit/noun 9 L3 25 word->[vc:#T,su:CW] 19 2 55
3 veroordeling/noun 5 L1 23 #T->mod:van->obj1:CW 9 3 30
4 vonnis/noun 5 L4 20 #T->su:CW 9 4 18
5 rechtbank/noun 4 L5 16 NA 8 NA 8
6 wet/noun 4 L6 14 #T->mod:CW 5 6 7
7 decreet/noun 3 R2 14 #T->mod:door->obj1:CW 5 5 6
8 schorsing/noun 3 R3 13 ben->[vc:#T,su:CW] 4 8 5
9 word/verb 3 L14 8 CW->vc:#T 3 7 3
10 afspraak/noun 2 L12 7 kan->vc:word->[vc:#T,su:CW] 3 9 2

herroepen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the four closest slots to the left and right of the target (but not the first slot to the right?), as direct object (#T->obj1:CW) of the target or in any case one or two steps away in the dependency path. The nine cues beyond the sentence correspond to four tokens that also have some cue inside the sentence.3

Table 5. Frequency of context words as cues of herroepen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 verklaring/noun 17 L2 18 #T->obj1:CW 60 1 76
2 uitspraak/noun 15 L1 14 NA 9 2 31
3 bekentenis/noun 3 L4 14 #T->mod:CW 6 3 9
4 zal/verb 3 L3 13 #T->obj1:uitspraak->det:CW 3 NA 9
5 zeg/verb 3 R3 13 CW->vc:#T 3 4 7
6 zijn/det 3 R2 10 word->[vc:#T,su:CW] 3 6 6
7 bewering/noun 2 L6 8 #T->mod:van->obj1:CW 2 5 4
8 dat/det 2 L5 7 #T->obj1:en->cnj:CW 2 7 2
9 getuige/noun 2 L7 6 #T->obj1:standpunt->mod:CW 2 8 1
10 het/det 2 L10 5 CW->cnj:#T 2 0

Most frequent dependency paths

Figure 7 shows the most frequent dependency paths colored by sense tag. There does seem to be a preference for the passive construction (X->[vc:#T,su:CW]) for herroepen_1 and for the active one (#T->obj1:CW) for herroepen_2.

Figure 7. Tokens per path.

Figure 7. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (11 tokens, mostly of herroepen_1);
  • garden-path tokens, as is the case of (1) through (3), of herroepen_1: the object is uitspraak, which in its meaning “utterance” is a typical object of herroepen_2, but here means “sentence” (in Court context), which has legal consequences of the herroepen_1 kind.
  • atypical contexts, as is the case of (4) through (7), of herroepen_2. In the first one, it is atypical that someone retracts someone else’s statements; in the second, that someone retracts all their messages (it does not sound like a normal sort of retraction), the third one is in verse and the fourth one is a reflexive construction.
  1. 2002-05-04 mr , lcp KV Mechelen krijgt licentie BRUSSEL - Het beroepscomité herriep gisteren de uitspraak van de licentiecommissie en besliste om KV Mechelen toch zijn licentie te geven .
  2. moment van hun geboorte hun ouders nog geen Hongkongse burgers waren . Daarmee herriep het hof zijn eigen uitspraak en gaf het Peking gelijk . Juristen nemen
  3. van medeplichtigheid aan de moord op Fortuyn . De uitspraak werd dinsdag weer herroepen , nadat de LPF zich zondag ook al had gedistantieerd van eerdere beschuldigingen aan het adres van Kok en Melkert .
  4. Maar het lijkt wel of het gemak waarmee hij de vergissingen van zijn voorgangers herroept , tegelijkertijd gepaard gaat met een hang naar nieuwe tegenstrijdigheden . ’ Ik
  5. Covad , een jong Californisch bedrijf dat de DSL-verbinding daadwerkelijk tot stand brengt met een zogenoemde bridge ( een soort modem ) , moest al zijn e-mails en telefonische boodschappen die in augustus binnenstroomden , herroepen . " Viktor , we zijn hard aan het werk om jouw DSL-verbinding
  6. Welnu : ’ Het ware middelpunt van ons heelal Is niet de Aarde , doch de Zon ’ Hetgeen men toen niet maken kon ‘t Was namelijk nog lang geen Carnaval Weldra verscheen hij voor ’t Gerecht De perspectieven waren slecht Dus met het lot van Bruno in ’t verschiet Herriep hij braaf zijn ketterij’ Maar toch beweegt ze , ’ bromde hij Want schuldbewust was Galilei niet .
  7. Senaat telt iedere zetel . Op één punt zal Bush zich vandaag misschien herroepen . Algemeen werd verwacht dat hij striktere regels voor financiële verslaglegging zou afkondigen

Removed tokens

4 tokens will be removed from the concordance: (8), where the target seems to be a one-word headline, and three duplicates.

  1. toe te laten . Later viseerde men expliciet de joden . Herroepen Jan Verstraete kreeg voor zijn onderzoek niet alleen inzage in een tot op

HEFFEN

Original senses and annotations

The tokens of heffen were annotated with 2 senses in 6 batches; the tags in Table 6 were suggested.

Table 6. Original definitions of ‘heffen’.
Definitions
heffen_1
(trans.) m.b.t. materiële zaken: in de hoogte brengen, optillen: met geheven hoofd; hij heft met gemak 80 kilo in de hoogte
(trans.) w.r.t. material objects: move to a higher position, lift: lifting their head; he easily lifted 80 kg
heffen_2
(trans.) m.b.t. geld e.d.: invorderen, eisen, opleggen: belasting, rente, accijns heffen
(trans.) w.r.t. money and such: collect, demand, impose: collect tax, interest, excise

Figure 8 shows the sense distribution by annotator and batch and Figure 9, that of the disagreements. Figure 10 shows the sense tags that each annotator of each batch assigned to the tokens with heffen_1 as majority sense and Figure 11 those for heffen_2.

General distribution

The second sense seems to be consistently more frequent than the first one; there is little disagreement, with none at all in the first batch and a maximum of 8 disagreeing annotations in annotator 2 of batch 3.

There are 5 instances with no agreement: one is an instance of a compound hefplateau and the rest, of opheffen.

The 9 tokens with wrong_lemma as majority sense are instances of hebben (typos, then), opheffen and aanheffen, and the two with not_listed as majority sense instantiate a discarded sense, “adjourn”.

Figure 8. Distribution of senses of 'heffen' per annotator and batch.

Figure 8. Distribution of senses of ‘heffen’ per annotator and batch.

Figure 9. Distribution of disagreeing annotations of 'heffen' per annotator and batch.

Figure 9. Distribution of disagreeing annotations of ‘heffen’ per annotator and batch.

Disagreement in heffen_1

There is little disagreement, mostly concentrated on annotator 2 of batch 3.

Figure 10. Sense annotations of tokens with 'heffen_1' as majority sense.

Figure 10. Sense annotations of tokens with ‘heffen_1’ as majority sense.

Disagreement in heffen_2

This sense is the most frequent and has very few instances of disagreement.

Figure 11. Sense annotations of tokens with 'heffen_2' as majority sense.

Figure 11. Sense annotations of tokens with ‘heffen_2’ as majority sense.

Final senses

The final definitions are the same as the original definitions: the one sense added based on the concordances and suggestions of the annotators, namely “adjourn”, was discarded because of its low frequency. In addition, some idiomatic expressions were identified, but they remain subordinated to heffen_1.

Original versus final sense distribution

Of the 240 tokens of heffen, 161 kept their original majority senses, none were corrected to another original sense, and 22 were removed. 57 tokens were identified as instances of some idiomatic expression.

Table 7 shows in how many tokens with each majority sense which actions were taken, and Figure 12 illustrates the frequency of the final tags. Figure 13 correlates the original majority sense and the final senses.

Figure 12. Final distribution of senses of 'heffen'.

Figure 12. Final distribution of senses of ‘heffen’.

Table 7. Cross-tabulation of original majority senses of ‘heffen’ and actions taken.
original idiom majority remove
heffen_1 57 21 4
heffen_2 0 140 2
no_agreement 0 0 5
not_listed 0 0 2
wrong_lemma 0 0 9
Figure 13. Majority and final senses of 'heffen'.

Figure 13. Majority and final senses of ‘heffen’.

Reliable cues

Table 8 shows the most frequent context words selected by the annotators as relevant. Table 9 and Table 10 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags heffen_1 and heffen_2.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 8 have no cues that match these criteria. 161 have one single cue and 71 have more than one (up to 5).

Across senses

The most frequent cues for heffen_1 are the collocates corresponding to the idioms identified: “het glas heffen”, “de handen ten hemel heffen”, “een vinger(tje) heffen”. The two most frequent for heffen_2 are indeed very frequent, but it must be taken into account that they often occur in compounds (such as bronbelasting at the end of the table), which have lower frequencies themselves. The removed tokens do not exhibit a stable pattern of cues, which is understandable. These are mostly tokens of opheffen and aanheffen and they were not always identified by the annotators as a different lemma (sometimes as a “different sense”).

Table 8. Frequency of cues by sense, counted by type.
Rank heffen_1 n heffen_2 n1 remove n2
1 glas/noun 33 belasting/noun 51 op/part 3
2 hand/noun 17 tol/noun 10 aan/prep 1
3 het/det 17 accijns/noun 9 ban_vloek/noun 1
4 hemel/noun 9 te/comp 5 belang/noun 1
5 te/prep 9 entree/noun 4 controleer/verb 1
6 arm/noun 6 schenking_recht/noun 3 klap/noun 1
7 de/det 4 statie_geld/noun 3 krijg/verb 1
8 hun/det 4 successie_recht/noun 3 laat/verb 1
9 vinger/noun 4 boete/noun 2 op/prep 1
10 zijn/det 4 bron_belasting/noun 2 plan/noun 1

heffen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 5 slots to the left and right of the target and up to two or three steps away in the dependency paths; they are mostly the object of the target but also the determiner of the objects glas and hand (in the forementioned idioms).

Table 9. Frequency of context words as cues of heffen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 glas/noun 33 R2 24 #T->obj1:CW 65 1 78
2 hand/noun 17 L1 21 #T->obj1:glas->det:CW 13 2 48
3 het/det 17 R3 18 #T->obj1:hand->det:CW 7 3 13
4 hemel/noun 9 L2 17 #T->mod:CW 6 4 6
5 te/prep 9 R1 12 word->[vc:#T,su:CW] 4 5 1
6 arm/noun 6 R4 11 #T->mod:te->obj1:CW 3 6 1
7 de/det 4 R5 10 ->[ROOT:#T,ROOT:wil->dp:CW] 2 7 1
8 hun/det 4 L3 7 #T->ld:CW 2 0
9 vinger/noun 4 R6 7 #T->obj1:arm->det:CW 2 0
10 zijn/det 4 L4 3 #T->obj1:arm->mod:CW 2 0

heffen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest three slots to the left of the target or, to a lesser degree, to the right, as either direct object (#T->obj1:CW) or passive subject (word->[vc:#T,su:CW], word being the verb worden) of the target and up to three steps away in the dependency path. The ten cues beyond the context correspond to 8 tokens: in 6 of them, there are also (enough) cues inside the sentence, in another one, the same context word occurs inside and outside the sentence and the latter was likely registered by a technical mistake, and in the last one the object of heffen is a pronoun and its antecedent, the cue, is indeed beyond the sentence.

Table 10. Frequency of context words as cues of heffen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 belasting/noun 51 L2 49 #T->obj1:CW 85 1 92
2 tol/noun 10 L1 44 word->[vc:#T,su:CW] 14 2 38
3 accijns/noun 9 L3 11 NA 10 3 22
4 te/comp 5 L4 9 #T->mod:van->obj1:CW 6 NA 10
5 entree/noun 4 R3 7 CW->body:#T 5 4 7
6 schenking_recht/noun 3 L5 6 CW->mod:die->body:word->vc:#T 4 7 6
7 statie_geld/noun 3 R2 6 #T->obj1:of->cnj:CW 3 5 2
8 successie_recht/noun 3 R1 5 CW->mod:die->body:#T 3 0
9 boete/noun 2 R4 5 #T->su:CW 2 0
10 bron_belasting/noun 2 L10 4 zal->vc:word->[vc:#T,su:CW] 2 0

Most frequent dependency paths

Figure 14 shows the most frequent dependency paths colored by sense tag. While a direct object seems frequent in both, the presence of a determiner for said object is much more prominent for heffen_1, as well as the subject of the verb; verbs of which the target is a complement, on the other hand, are more present in heffen_2. While the passive construction was a relevant cue, the 14 instances in which the subject was tagged as cue were the only ocurrences - it represents then only 10% of the tokens with this sense.

Figure 14. Tokens per path.

Figure 14. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (6 tokens, all from heffen_2);
  • headlines (1 token, from heffen_2);
  • atypical context ((9), where the object is missing);
  • idiomatic expressions: 35 instances of “het glas heffen”, 15 of “de handen ten hemel heffen” and 7 of “de vinger heffen” or a variant thereof. All of these are considered cases of heffen_1.
  1. En dan nog : om de attributen te plaatsen , moet je ook heffen . Sinds november vorig jaar ga ik haast wekelijks en dit op eigen

Removed tokens

19 tokens will be removed because they are not instances of heffen, 2 because they instantiate another sense, namely “adjourn”, and one because it is too exceptional (“de loftrompet heffen”). Of the ones that do not match the target lemma, one is an instance of hefplateau ‘lifting platform’, while the rest are occurrences of hebben, where a typo lead to wrong annotation, and opheffen or aanheffen, where the particle was not counted as part of the verb.


HULDIGEN

Original senses and annotations

The tokens of huldigen were annotated with 2 senses in 6 batches; the tags in Table 11 were suggested.

Table 11. Original definitions of ‘huldigen’.
Definitions
huldigen_1
(trans.) iets of iem. eer bewijzen, vieren: we huldigen de uitvinder van de herbruikbare broodzak
(trans.) celebrate, pay homage to someone or something: we honor the inventor of the reusable bread bag
huldigen_2
(trans.) erkennen, aankleven, toegedaan zijn: een opvatting, mening, theorie huldigen
(trans.) acknowledge, follow, be commited to: hold a view, an opinion, a theory

Figure 15 shows the sense distribution by annotator and batch and Figure 16, that of the disagreements. Figure 17 shows the sense tags that each annotator of each batch assigned to the tokens with huldigen_1 as majority sense and Figure 17 those for huldigen_2.

General distribution

The senses seem to be equally frequent in the first two batches, but huldigen_2 is more frequent in the third batch and extremely infrequent in the other three. Except for batch 6, where one annotator disagreed in 7 huldigen_1 cases, at least 90% of the tokens of each batch have full agreement. Only 2 have no agreement at all: one was resolved to huldigen_1 and the other one was an instance of inhuldigen. The one case with unclear as majority sense was removed.

Figure 15. Distribution of senses of 'huldigen' per annotator and batch.

Figure 15. Distribution of senses of ‘huldigen’ per annotator and batch.

Figure 16. Distribution of disagreeing annotations of 'huldigen' per annotator and batch.

Figure 16. Distribution of disagreeing annotations of ‘huldigen’ per annotator and batch.

Disagreement in huldigen_1

The first sense covers a quarter of the tokens of batch 3, half of the first two batches and more than three quarters of the other three; other than the huldigen_2 suggestions of the third annotator of batch 6, there is barely any disagreement.

Figure 17. Sense annotations of tokens with 'huldigen_1' as majority sense.

Figure 17. Sense annotations of tokens with ‘huldigen_1’ as majority sense.

Disagreement in huldigen_2

The second sense covers 25% to 45% of the first two batches and almost three quarters of the third, where there are some disagreements, but 10% or less of the other three (probably lect-dependent, since the last batches tend to have Flemish tokens).

Figure 18. Sense annotations of tokens with 'huldigen_2' as majority sense.

Figure 18. Sense annotations of tokens with ‘huldigen_2’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified.

Original versus final sense distribution

Of the 240 tokens of huldigen, 229 kept their original majority senses, 1 was corrected to another original sense, and 10 were removed.

Table 12 shows in how many tokens with each majority sense which actions were taken, and Figure 19 illustrates the frequency of the final tags. Figure 20 correlates the original majority sense and the final senses.

Figure 19. Final distribution of senses of 'huldigen'.

Figure 19. Final distribution of senses of ‘huldigen’.

Table 12. Cross-tabulation of original majority senses of ‘huldigen’ and actions taken.
original correct majority remove
huldigen_1 0 162 6
huldigen_2 0 67 2
no_agreement 1 0 1
unclear 0 0 1
Figure 20. Majority and final senses of 'huldigen'.

Figure 20. Majority and final senses of ‘huldigen’.

Reliable cues

Table 13 shows the most frequent context words selected by the annotators as relevant. Table 14 and Table 15 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags huldigen_1 and huldigen_2.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 15 have no cues that match these criteria. 105 have one single cue and 120 have more than one (up to 6).

Across senses

While some nouns are selected as cues for both senses, and belonging to quite different domains, the prepositions als and voor also stand out for huldigen_1. The few cues for the discarded tokens can be neglected, although the first one is important: most of those tokens were instances of inhuldigen.

Table 13. Frequency of cues by sense, counted by type.
Rank huldigen_1 n huldigen_2 n1 remove n2
1 kampioen/noun 15 principe/noun 14 in/adj 1
2 als/prep 10 standpunt/noun 9 te/comp 1
3 voor/prep 8 het/det 7 0
4 gemeente_bestuur/noun 6 opvatting/noun 7 0
5 winnaar/noun 6 de/det 3 0
6 goed/adj 5 een/det 2 0
7 laureaat/noun 5 ik/pron 2 0
8 speler/noun 5 mening/noun 2 0
9 verdienstelijk/adj 5 van/prep 2 0
10 word/verb 5 aandeelhouder_schap/noun 1 0

huldigen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 5 slots on any side of the target and up to 5 steps away in the depedency path; the most popular relations are the direct object (#T->obj1:CW, but also #T->obj1:en->cnj:CW, coordinated direct object), the passive subject (word->[vc:#T,su:CW] and ben->[vc:#T,su:CW]), a modifier (#T->mod:CW, mostly filled in by the prepositions als and voor, but also door for the agent of a passive construction) and the objects depending on such modifiers.

Eight cues are located beyond the sentence, in 7 tokens. In five cases, there are also (enough) cues inside the sentence; in another, the same wordform occurs outside and inside and the former was registered, while in the last one the annotators’ behaviour is difficult to explain.4

Table 14. Frequency of context words as cues of huldigen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 kampioen/noun 15 L1 32 #T->obj1:CW 51 2 132
2 als/prep 10 R2 32 word->[vc:#T,su:CW] 38 1 85
3 voor/prep 8 L2 31 #T->mod:CW 20 3 48
4 gemeente_bestuur/noun 6 R1 29 #T->mod:als->obj1:CW 11 4 26
5 winnaar/noun 6 L5 26 #T->mod:voor->obj1:CW 10 5 13
6 goed/adj 5 R3 23 #T->obj1:en->cnj:CW 10 NA 8
7 laureaat/noun 5 L3 19 #T->mod:door->obj1:CW 9 6 5
8 speler/noun 5 R4 16 ben->[vc:#T,su:CW] 8 8 2
9 verdienstelijk/adj 5 L6 15 NA 8 0
10 word/verb 5 L4 13 CW->vc:#T 7 0

huldigen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the first three slots to the right of the target or the first one to the left, up to 2 steps away in the dependency path, mainly as direct object (#T->obj1:CW, but also #T->obj1:en->cnj:CW, coordinated direct object) and sometimes subject (#T->su:CW) of the target.

Table 15. Frequency of context words as cues of huldigen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 principe/noun 14 R3 28 #T->obj1:CW 56 1 67
2 standpunt/noun 9 R2 23 #T->su:CW 9 2 39
3 het/det 7 L1 13 #T->obj1:principe->det:CW 7 3 7
4 opvatting/noun 7 R1 12 #T->obj1:en->cnj:CW 4 4 5
5 de/det 3 R4 9 #T->obj1:Eerst’_principe->mod:CW 2 5 2
6 een/det 2 R5 8 #T->obj1:Patrick->mwp:CW 2 6 1
7 ik/pron 2 L2 6 #T->obj1:principe->mod:CW 2 7 1
8 mening/noun 2 L3 5 CW->body:#T 2 0
9 van/prep 2 R6 4 CW->mod:dat->body:word->vc:#T 2 0
10 aandeelhouder_schap/noun 1 L10 2 CW->mod:die->body:#T 2 0

Most frequent dependency paths

Figure 21 shows the most frequent dependency paths colored by sense tag. There seems to be a preference for passive construction and combination with a modifier for huldigen_1, and for active construction for huldigen_1.

Figure 21. Tokens per path.

Figure 21. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (1 token, from huldigen_2);
  • headlines (12 tokens, mostly from huldigen_1);
  • atypical context ((10), also a headline that requires encyclopedic knowledge).
  1. Bis
    algemeen 2003-08-12 Didier Wijnants Josse De Pauw huldigt muzikaal op je bek gaan Acteur en auteur Josse De Pauw heeft als

Removed tokens

One token, (11), will be removed because it is nonsensical and 9 more because they instantiate inhuldigen instead of the target lemma.

  1. Zelfs het Journaal berichtte over 1500 zoenende landgenoten in Scheveningen . RTL 4 huldigde in Valentijn 2004 , gepresenteerd door Irene van de Laar in debotel .

HATEN

Original senses and annotations

The tokens of haten were annotated with 2 senses in 6 batches; the tags in Table 16 were suggested.

Table 16. Original definitions of ‘haten’.
Definitions
haten_1
(trans.) iem. haat toedragen, een sterk gevoel van afkeer en vijandschap t.o.v. iem. hebben: waarom haat hij mij zo?
(trans.) feel hatred, have a strong feeling of aversion and enmity towards someone: why does he hate me so much?
haten_2
(trans.) iets onaangenaam, verfoeilijk, verwerpelijk vinden: hoe zou iemand de taalkunde kunnen haten?
(trans.) consider something unpleasant, detestable, reprehensible: how could someone hate linguistics?

Figure 22 shows the sense distribution by annotator and batch and Figure 23, that of the disagreements. Figure 24 shows the sense tags that each annotator of each batch assigned to the tokens with haten_1 as majority sense and Figure 25 those for haten_2.

General distribution

The tokens seem to be split half and half between the senses, with some more instances of haten_1 in the first batch. There are 8 tokens with no agreement and some disagreements across all batches, but not that many.

Of the 8 tokens with no agreement, 4 were instances of the noun haat or English hate and were removed, while the rest could be assigned one of the senses.

Figure 22. Distribution of senses of 'haten' per annotator and batch.

Figure 22. Distribution of senses of ‘haten’ per annotator and batch.

Figure 23. Distribution of disagreeing annotations of 'haten' per annotator and batch.

Figure 23. Distribution of disagreeing annotations of ‘haten’ per annotator and batch.

Disagreement in haten_1

There are a couple of disagreeing annotations in each batch, from diverse annotators.

Figure 24. Sense annotations of tokens with 'haten_1' as majority sense.

Figure 24. Sense annotations of tokens with ‘haten_1’ as majority sense.

Disagreement in haten_2

There are very few disagreements, from diverse annotators; none in batch 3.

Figure 25. Sense annotations of tokens with 'haten_2' as majority sense.

Figure 25. Sense annotations of tokens with ‘haten_2’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified. However, a small category was added to include 16 tokens that could belong to either haten_1 or haten_2.

Original versus final sense distribution

Of the 240 tokens of haten, 207 kept their original majority senses, 6 were corrected to another original sense, and 11 were removed. 16 tokens were assigned a new sense.

Table 17 shows in how many tokens with each majority sense which actions were taken, and Figure 26 illustrates the frequency of the final tags. Figure 27 correlates the original majority sense and the final senses.

Figure 26. Final distribution of senses of 'haten'.

Figure 26. Final distribution of senses of ‘haten’.

Table 17. Cross-tabulation of original majority senses of ‘haten’ and actions taken.
original correct majority new remove
haten_1 2 99 9 2
haten_2 1 108 5 5
no_agreement 2 0 2 4
unclear 1 0 0 0
Figure 27. Majority and final senses of 'haten'.

Figure 27. Majority and final senses of ‘haten’.

Reliable cues

Table 18 shows the most frequent context words selected by the annotators as relevant. Table 19 and Table 20 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags haten_1 and haten_2.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 39 have no cues that match these criteria. 93 have one single cue and 108 have more than one (up to 6).

Across senses

The most common cues are personal pronouns and worden for haten_1 and determiners and wat for haten_2. The context words selected as cues in the rightmost columns all belong to the same token: they are all the words in the sentence where the token, here the wrong lemma, occurs, namely “Eén pennentrek gomt eeuwen haat niet weg”.

Table 18. Frequency of cues by sense, counted by type.
Rank haten_1 n haten_2 n1 remove n2
1 hem/pron 9 het/det 18 eén/num 1
2 ze/pron 9 de/det 11 eeuw/noun 1
3 word/verb 7 ik/pron 10 gomt/noun 1
4 ik/pron 5 wat/pron 8 niet/adv 1
5 te/comp 5 dat/det 7 pennentrek/noun 1
6 de/det 4 te/comp 4 weg/noun 1
7 elkaar/pron 4 verlies/verb 3 0
8 hen/pron 4 winkel/verb 3 0
9 hij/pron 4 woord/noun 3 0
10 je/pron 4 dat/comp 2 0

haten_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 4 slots to the left or 3 to the right of the token, up to three steps away in the dependency path (but overwhelmingly one), and mainly as direct object (#T->obj1:CW, #T->obj1:en->cnj:CW) but also as active subject (#T->su:CW) and in other roles. The fact that so many cues share a path or at least a path length but not lemma indicate that there is quite a variety in the types that fill the popular slots.

Table 19. Frequency of context words as cues of haten_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 hem/pron 9 L1 37 #T->obj1:CW 69 1 99
2 ze/pron 9 R1 32 #T->su:CW 15 2 47
3 word/verb 7 L2 25 CW->vc:#T 7 3 20
4 ik/pron 5 L3 25 #T->obj1:en->cnj:CW 6 4 7
5 te/comp 5 R2 19 word->[vc:#T,su:CW] 6 NA 4
6 de/det 4 L4 10 CW->body:#T 5 7 3
7 elkaar/pron 4 L5 6 NA 4 5 2
8 hen/pron 4 L6 5 #T->mod:CW 3 0
9 hij/pron 4 R3 5 #T->mod:door->obj1:CW 3 0
10 je/pron 4 L8 3 #T->obj1:vader->det:CW 2 0

haten_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 3 slots on any side of the target, up to three steps away in the dependency path (but overwhelmingly one), and mainly as direct object (#T->obj1:CW, #T->obj1:en->cnj:CW) but also as active subject (#T->su:CW) and in other roles. Here as well, there is a wide variety of lemmas that can fill these slots.

Table 20. Frequency of context words as cues of haten_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 het/det 18 R1 58 #T->obj1:CW 88 1 119
2 de/det 11 R2 37 #T->su:CW 16 2 46
3 ik/pron 10 L1 31 CW->body:#T 8 3 18
4 wat/pron 8 L3 15 #T->obj1:en->cnj:CW 5 4 6
5 dat/det 7 L2 14 ->[ROOT:#T,ROOT:CW] 2 5 5
6 te/comp 4 R3 12 #T->dp:CW 2 6 2
7 verlies/verb 3 R4 6 #T->mod:CW 2 NA 2
8 winkel/verb 3 L4 5 #T->mod:om->body:te->body:CW 2 7 1
9 woord/noun 3 R5 5 #T->obj1:wereld->det:CW 2 0
10 dat/comp 2 L5 4 #T->obj1:woord->det:CW 2 0

Most frequent dependency paths

Figure 28 shows the most frequent dependency paths colored by sense tag. The profiles of each sense are quite similar, especially if we discard the punctuation; it does seem that a lower number of haten_1 tokens has no subject.

Figure 28. Tokens per path.

Figure 28. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • garden-path tokens ((12) and (13), from haten_1 and haten_2 respectively);
  • headlines (2 tokens, from haten_2);
  • title (1 token, from haten_1).
  1. We schieten enkel het overtollige wild . De sperwer wordt wel gehaat door duivenliefhebbers . Voor sperwers zijn jonge duifjes een makkelijke prooi .
  2. Alles daarbuiten is voor mij valse en absurde Kunst , namaak ; ik haat de Leys en de Lies , de Tissots en de Comtes met hun valse naïviteit , hun onechte couleur locale en hun gewaden van zijde en gouddraad … "

Removed tokens

11 tokens will be removed: 2 are instances of the English hate, 5 of the noun haat, and 4 are partial duplicates: the same sentence, “Wat haat je?”, is repeated in different contexts that could only be distinguished by bag-of-words models without sentence boundaries, and barely at that.


DISKWALIFICEREN

Original senses and annotations

The tokens of diskwalificeren were annotated with 3 senses in 6 batches; the tags in Table 21 were suggested.

Table 21. Original definitions of ‘diskwalificeren’.
Definitions
diskwalificeren_1
(trans.) ongeschikt verklaren en uitsluiten van een bepaalde functie of positie: een getuige diskwalificeren
(trans.) declare unsuitable and exclude from a certain function or position: disqualify a witness
diskwalificeren_2
(trans.) wegens onregelmatigheden uitsluiten bij een wedstrijd: FC De Trappers werd gediskwalificeerd wegens wangedrag
(trans.) exclude from a competition because of irregularities: FC De Trappers was disqualified because of misbehaviour
diskwalificeren_3
(reflex.) zichzelf buiten spel zetten, zich onmogelijk maken: met zulk gedrag diskwalificeer je jezelf
(reflex.) exclude oneself, make oneself impossible: with such a behaviour you disqualify yourself

Figure 29 shows the sense distribution by annotator and batch and Figure 30, that of the disagreements. Figure 31 shows the sense tags that each annotator of each batch assigned to the tokens with diskwalificeren_1 as majority sense, Figure 32 those for diskwalificeren_2 and Figure 33 for diskwalificeren_3.

General distribution

The second reading, diskwalificeren_2, is the most frequent one, especially in the last two batches (as expected; the sources in those batches are the Belgian newspapers, which tend to have more sport articles); the third sense is the most infrequent. There is little disagreement, with the most dissenting annotators disagreeing in only five instances; only one token presents no agreement at all and could be tagged as diskwalificeren_2, while the one with unclear as majority sense was removed.

Figure 29. Distribution of senses of 'diskwalificeren' per annotator and batch.

Figure 29. Distribution of senses of ‘diskwalificeren’ per annotator and batch.

Figure 30. Distribution of disagreeing annotations of 'diskwalificeren' per annotator and batch.

Figure 30. Distribution of disagreeing annotations of ‘diskwalificeren’ per annotator and batch.

Disagreement in diskwalificeren_1

The first sense covers about 20%-50% of each batch, with few disagreements: none in batch 3, and as few as 5 in batch 2, where it was disagreed the most. Suggestions include any possible tag.

Figure 31. Sense annotations of tokens with 'diskwalificeren_1' as majority sense.

Figure 31. Sense annotations of tokens with ‘diskwalificeren_1’ as majority sense.

Disagreement in diskwalificeren_2

The second sense covers about 50%-80% of each batch, mostly in the two last batches. There are few disagreements, mostly with diskwalificeren_1 as alternative and ocasionally with unclear or the third reading.

Figure 32. Sense annotations of tokens with 'diskwalificeren_2' as majority sense.

Figure 32. Sense annotations of tokens with ‘diskwalificeren_2’ as majority sense.

Disagreement in diskwalificeren_3

The third sense covers 1 to 7 tokens of each batch, mostly with diskwalificeren_1 (the other non sport-related reading) as alternative, in spite of the different argument structure (this one is reflexive instead of transitive).

Figure 33. Sense annotations of tokens with 'diskwalificeren_3' as majority sense.

Figure 33. Sense annotations of tokens with ‘diskwalificeren_3’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified.

Original versus final sense distribution

Of the 240 tokens of diskwalificeren, 230 kept their original majority senses, 8 were corrected to another original sense, and 2 were removed.

Table 22 shows in how many tokens with each majority sense which actions were taken, and Figure 34 illustrates the frequency of the final tags. Figure 35 correlates the original majority sense and the final senses.

Figure 34. Final distribution of senses of 'diskwalificeren'.

Figure 34. Final distribution of senses of ‘diskwalificeren’.

Table 22. Cross-tabulation of original majority senses of ‘diskwalificeren’ and actions taken.
original correct majority remove
diskwalificeren_1 2 64 0
diskwalificeren_2 3 145 1
diskwalificeren_3 2 21 0
no_agreement 1 0 0
unclear 0 0 1
Figure 35. Majority and final senses of 'diskwalificeren'.

Figure 35. Majority and final senses of ‘diskwalificeren’.

Reliable cues

Table 23 shows the most frequent context words selected by the annotators as relevant. Table 24, Table 25 and Table 26 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags diskwalificeren_1, diskwalificeren_2 and diskwalificeren_3.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 29 have no cues that match these criteria. 78 have one single cue and 133 have more than one (up to 6).

Across senses

The most frequent cues of diskwalificeren_1 are the preposition als and nouns of the domain of politics, while those of diskwalificeren_2 belong mostly to the domain of sports, including the expression “valse start” (a common cause of disqualification in a sports race). The most frequent ones for diskwalificeren_3, the reflexive reading, are of course zich and zichzelf.

Table 23. Frequency of cues by sense, counted by type.
Rank diskwalificeren_1 n diskwalificeren_2 n1 diskwalificeren_3 n2
1 als/prep 9 vals/adj 11 zich/pron 14
2 kandidaat/noun 3 start/noun 10 zichzelf/pron 7
3 partij/noun 3 finale/noun 8 met/prep 1
4 politiek/adj 3 meter/noun 6 uitlating/noun 1
5 gesprekspartner/noun 2 winnaar/noun 6 ze/pron 1
6 oud/adj 2 finish/noun 5 zijn/det 1
7 afgevaardigen/noun 1 wedstrijd/noun 5 0
8 als/comparative 1 atleet/noun 4 0
9 argument/noun 1 kampioenschap/noun 4 0
10 behoefte/noun 1 olympisch/adj 4 0

diskwalificeren_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest three slots to either side of the target, up to four steps away in the dependency path and mainly as direct object (#T->obj1:CW, #T->obj1:en->cnj:CW). The long path in the sixth row corresponds to two coordinated items in the same token, partijleden and kiezers in “…gediskwalificeerd in de ogen van veel partijleden en kiezers”.

The six cues beyond sentence boundaries belong to three tokens; in two of them, there are also (enough) cues inside the sentence, while in the third one the target occurs in a short “sentence” (“Bij voorbaat gediskwalificeerd zijn:”) followed by numerated items.

Table 24. Frequency of context words as cues of diskwalificeren_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 als/prep 9 R3 10 #T->obj1:CW 12 2 20
2 kandidaat/noun 3 L2 9 NA 6 1 19
3 partij/noun 3 L6 9 #T->mod:CW 5 3 19
4 politiek/adj 3 R2 9 #T->mod:als->obj1:CW 4 4 12
5 gesprekspartner/noun 2 R1 8 #T->obj1:en->cnj:CW 3 NA 6
6 oud/adj 2 L3 7 ->[ROOT:#T,ROOT:zal->dp:in->obj1:oog->mod:van->obj1:en->cnj:CW] 2 6 5
7 afgevaardigen/noun 1 R4 6 #T->mod:van->obj1:CW 2 7 4
8 als/comparative 1 L4 5 #T->mod:wegens->obj1:besef->mod:CW 2 5 3
9 argument/noun 1 L5 5 ben->[vc:#T,su:CW] 2 9 2
10 behoefte/noun 1 L8 5 word->[vc:#T,dp:als->obj1:en->cnj:CW] 2 10 2

diskwalificeren_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the five closes slots to the left, the third closest slot to the right and the seventh and eight closes slots to either side of the target. They can be up to six steps away in the dependency path, but not so frequently one step away, and very often (in 50 tokens, of which 18 only have such cues) beyond the sentence boundary.

The most frequent paths among these cues are the passive subject (word->[vc:#T,su:CW], ben->[vc:#T,su:CW]), the object linked through prepositions (#T->mod:wegens->obj1:CW and cases with in, na, bij, tijdens…), and the direct object (#T->obj1:CW).

Table 25. Frequency of context words as cues of diskwalificeren_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 vals/adj 11 L7 21 NA 82 NA 82
2 start/noun 10 R3 20 word->[vc:#T,su:CW] 12 2 64
3 finale/noun 8 L1 19 #T->mod:wegens->obj1:CW 11 3 53
4 meter/noun 6 L2 18 #T->obj1:CW 9 4 41
5 winnaar/noun 6 R8 18 #T->mod:in->obj1:CW 8 5 33
6 finish/noun 5 L8 17 #T->mod:na->obj1:CW 8 6 20
7 wedstrijd/noun 5 L4 16 #T->mod:bij->obj1:CW 5 1 15
8 atleet/noun 4 L3 15 ben->[vc:#T,su:CW] 5 7 11
9 kampioenschap/noun 4 L5 15 #T->mod:na->obj1:start->mod:CW 4 8 8
10 olympisch/adj 4 R5 15 #T->mod:tijdens->obj1:CW 4 9 5

diskwalificeren_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the five closest slots to the left and the first to the right of the token, mainly one step away as the direct object (#T->obj1:CW). Even though there is a dedicated dependency tag for reflexive objects (se), it was not used for this verb.

The long path in the second row corresponds to the link between zich and the target lemma in “Als de kiezer dat zou moeten beoordelen, welke Kamer (of partij) diskwalificeert zich dan in de ogen van de kiezer?”. Parsing error.

Table 26. Frequency of context words as cues of diskwalificeren_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 zich/pron 14 L2 7 #T->obj1:CW 20 1 22
2 zichzelf/pron 7 L5 4 ->ROOT:als->body:zal->su:kiezer->mod:welk->[body:#T,ROOT:CW] 1 2 1
3 met/prep 1 R1 4 #T->mod:CW 1 3 1
4 uitlating/noun 1 L1 3 #T->mod:met->obj1:CW 1 6 1
5 ze/pron 1 L3 3 #T->mod:met->obj1:uitlating->det:CW 1 0
6 zijn/det 1 L4 1 #T->su:CW 1 0
7 0 L8 1 0 0
8 0 R2 1 0 0
9 0 R3 1 0 0
10 0 0 0 0

Most frequent dependency paths

Figure 36 shows the most frequent dependency paths colored by sense tag. Passive construction and verbs of which the target is a complement are preferred by diskwalificeren_2, while direct object are preferred by the other two.

Figure 36. Tokens per path.

Figure 36. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (1 token from diskwalificeren_1);
  • garden-path: (14) and (15), of diskwalificeren_1 and diskwalificeren_2 respectively. The former has a sport context but talks about the prestige of clubs rather than participation in a competition, while the latter co-occurs with zich but is not reflexive.
  • headlines (1 token from diskwalificeren_1);
  • atypical context (4 tokens of diskwalificeren_2, which are lists of results from competitions, but also (16) of diskwalificeren_1 with a missing object);
  • encyclopedic knowledge necessary to disambiguate: (17) and (18), where it is necessary to recognize the names of the chess players and Formula 1 racers to know that it is a sport context (diskalificeren_2);
  • metalinguistic use (2 tokens, from diskwalificeren_2, in which the target explains an abbreviation).
  1. een schandalig voorstel . PSV verheft zichzelf boven de rest in Nederland en diskwalificeert een club als Vitesse door te praten over een Mickey Mouse-competitie . We
  2. Nog afgezien van zijn misdragingen buiten de ring iemand die tijdens een gevecht tot twee keer toe zijn tegenstander in het oor bijt omdat hij op punten dreigt te verliezen , en zich zo doelbewust laat diskwalificeren , zo iemand heeft gewoonweg geen heart . Misschien is boksen tegenwoordig weer
  3. tillen we de zaak omdat we daarin wat slimmer zijn geworden . Ik diskwalificeer niet , maar laat ik zeggen dat we al wat langer boekhouden dan de Spanjaard . "
  4. vliegtuig te zitten . Paniek bij de organisatie , Georgiev dreigde gediskwalificeerd te worden en in zijn plaats zou Sijbrands de hand mogen schudden van Van Vollenhoven .
  5. een brief naar de commissarissen schreef . Met het verzoek beide McLarens te diskwalificeren . " Zoiets kan gewoon niet , " liet Dennis zich ontvallen .

Removed tokens

2 tokens will be removed: one because the context is not enough to disambiguate, and the other one because it is a duplicate of another token.


HERSTRUCTUREREN

Original senses and annotations

The tokens of herstructureren were annotated with 3 senses in 6 batches; the tags in Table 27 were suggested.

Table 27. Original definitions of ‘herstructureren’.
Definitions
herstructureren_1
(trans.) reorganiseren, een nieuwe structuur geven: je kunt deze tekst maar beter herstructureren
(trans.) reorganizz, give a new structure: you should restructure this text
herstructureren_2
(trans.) m.b.t. bedrijven in problemen: activiteiten of personeel afstoten, downsizen: Bayer herstructureert zijn plasticdivisie
(trans.) w.r.t. businesses in difficulties: remove activities or personeel, downsize: Bayer restructures its plastic division
herstructureren_3
(intrans.) van bedrijven in problemen: activiteiten of personeel afstoten, downsizen: de chemie moet zich herstructureren
(intrans.) of businesses in difficulties: remove activities or personeel, downsize: chemistry must restructure (itself)

Figure 37 shows the sense distribution by annotator and batch and Figure 38, that of the disagreements. Figure 39 shows the sense tags that each annotator of each batch assigned to the tokens with herstructureren_1 as majority sense, Figure 40 for those of herstructureren_2 and Figure 41 for herstructureren_3.

General distribution

The sense distribution is anything but stable, both between and and within batches. In some batches, even 10% of the tokens have no agreement at all and some annotators dissent in about 50% of their annotations, but mostly there is disagreement regarding tokens with either herstructureren_2 or herstructureren_3 (the “business” readings) as majority sense.

A total of 17 (7.08%) tokens have no agreement, but none have a geen majority sense. They could all be retagged to one of the senses.

Figure 37. Distribution of senses of 'herstructureren' per annotator and batch.

Figure 37. Distribution of senses of ‘herstructureren’ per annotator and batch.

Figure 38. Distribution of disagreeing annotations of 'herstructureren' per annotator and batch.

Figure 38. Distribution of disagreeing annotations of ‘herstructureren’ per annotator and batch.

Disagreement in herstructureren_1

This reading covers about 10%-30% of each batch, with some alternative annotations of the other senses, especially from annotator 3 of batch 1 and annotator 2 of batch 3.

Figure 39. Sense annotations of tokens with 'herstructureren_1' as majority sense.

Figure 39. Sense annotations of tokens with ‘herstructureren_1’ as majority sense.

Disagreement in herstructureren_2

This reading covers 25%-50% of each batch, although a large portion of them received the intransitive counterpart as alternative, particularly from some particular annotators, and some others the other transitive tag.

Figure 40. Sense annotations of tokens with 'herstructureren_2' as majority sense.

Figure 40. Sense annotations of tokens with ‘herstructureren_2’ as majority sense.

Disagreement in herstructureren_3

This reading covers about 10%-50%, but a large section received herstructureren_2 as alternative, especially from certain particular annotators.

Figure 41. Sense annotations of tokens with 'herstructureren_3' as majority sense.

Figure 41. Sense annotations of tokens with ‘herstructureren_3’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified.

Original versus final sense distribution

Of the 240 tokens of herstructureren, 165 kept their original majority senses, 75 were corrected to another original sense, and none were removed.

Table 28 shows in how many tokens with each majority sense which actions were taken, and Figure 42 illustrates the frequency of the final tags. Figure 43 correlates the original majority sense and the final senses.

Figure 42. Final distribution of senses of 'herstructureren'.

Figure 42. Final distribution of senses of ‘herstructureren’.

Table 28. Cross-tabulation of original majority senses of ‘herstructureren’ and actions taken.
original correct majority
herstructureren_1 6 39
herstructureren_2 25 68
herstructureren_3 27 58
no_agreement 17 0
Figure 43. Majority and final senses of 'herstructureren'.

Figure 43. Majority and final senses of ‘herstructureren’.

Reliable cues

Table 29 shows the most frequent context words selected by the annotators as relevant. Table 30, Table 31 and Table 32 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags herstructureren_1, herstructureren_2 and herstructureren_3.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 85 have no cues that match these criteria. 68 have one single cue and 87 have more than one (up to 10).

Across senses

As would be expected, the lemmas that were selected as cues for herstructureren_1 are different from those in the other two senses, which share bedrijf and baan. However, they are very infrequent –this could be due to the variety of lemmas, but also to the amount of disagreement between the annotators, which lower the chances of agreement in both sense tag and cue selection. Other than bedrijf as cue for the “business” senses, two lemmas stand out that actually represent syntactic constructions: the verb worden for herstructureren_2 and the particle te for herstructureren_3.

Table 29. Frequency of cues by sense, counted by type.
Rank herstructureren_1 n herstructureren_2 n1 herstructureren_3 n2
1 schuld/noun 3 word/verb 14 te/comp 8
2 het/det 2 bedrijf/noun 9 bedrijf/noun 7
3 kruispunt/noun 2 te/comp 9 om/comp 5
4 aantal/noun 1 het/det 7 moet/verb 4
5 administratie/noun 1 zijn/det 4 zich/pron 4
6 bedrijf_terrein/noun 1 activiteit/noun 3 ben/verb 3
7 belang/noun 1 moet/verb 3 dat/comp 3
8 boek/noun 1 afdeling/noun 2 het/det 3
9 Bornem_centrum/noun 1 baan/noun 2 baan/noun 2
10 choreografie/noun 1 de/det 2 fabriek/noun 2

herstructureren_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest three slots to either side of the target, up to three steps away in the dependency path, and as direct object (#T->obj1:CW) of the target, but also as passive subject and in other relations.

Table 30. Frequency of context words as cues of herstructureren_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 schuld/noun 3 L2 13 #T->obj1:CW 18 1 22
2 het/det 2 L1 6 word->[vc:#T,su:CW] 5 3 13
3 kruispunt/noun 2 R2 6 #T->mod:van->obj1:CW 3 2 11
4 aantal/noun 1 R3 5 moet->vc:word->[vc:#T,su:CW] 2 4 4
5 administratie/noun 1 L6 4 ->[ROOT:#T,ROOT:wil->dp:CW] 1 5 3
6 bedrijf_terrein/noun 1 L3 3 #T->det:CW 1 6 3
7 belang/noun 1 L4 3 #T->mod:CW 1 7 1
8 boek/noun 1 L11 2 #T->mod:om->body:te->body:word->predc:CW 1 9 1
9 Bornem_centrum/noun 1 L12 2 #T->mod:om->obj1:CW 1 10 1
10 choreografie/noun 1 R1 2 #T->mod:tot->body:CW 1 0

herstructureren_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest six or seven slots to the left of the target and the first to the right, up to 3 or 4 steps in the dependency path, mainly as direct object (#T->obj1:CW) of the target or verb of which the target is a complement (CW->vc:#T, mostly worden but also hebben and moeten) but also in the construction “te herstructureren” (CW->body:#T) or as passive subject (word->[vc:#T,su:CW]).

The five cues beyond the sentence correspond to three tokens: in two of them, banen and verdwijnen are indeed good indicators of the “business” readings but they occur in a different sentence from the target; in the third one, for some unexplainable reason two annotators agreed both on the sense and on a context word from a different sense that is not related to the target.5

Table 31. Frequency of context words as cues of herstructureren_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 word/verb 14 L2 34 #T->obj1:CW 36 1 68
2 bedrijf/noun 9 L1 32 CW->vc:#T 15 2 38
3 te/comp 9 L3 20 CW->body:#T 9 3 16
4 het/det 7 L4 9 word->[vc:#T,su:CW] 7 4 11
5 zijn/det 4 L5 7 NA 5 NA 5
6 activiteit/noun 3 L6 6 #T->su:CW 4 5 4
7 moet/verb 3 R1 6 #T->mod:CW 2 6 2
8 afdeling/noun 2 L7 5 CW->cnj:#T 2 9 2
9 baan/noun 2 L8 4 CW->vc:word->vc:#T 2 12 1
10 de/det 2 R2 4 word->vc:of->[cnj:#T,su:CW] 2 0

herstructureren_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest seven slots to the left of the target, up to three or four steps away in the dependency path. The most frequent path between a cue and the target seems to be one where they are both the root of the sentence. This occurs in 9 different sentences that must have confused the automatic parse —the wordforms of these cues are: ABX, dat, verklaarde, verkocht, te, is, aan, het, bedrijven, Hyperport, Palm.

Table 32. Frequency of context words as cues of herstructureren_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 te/comp 8 L1 27 ->[ROOT:#T,ROOT:CW] 12 2 37
2 bedrijf/noun 7 L2 17 #T->obj1:CW 9 3 28
3 om/comp 5 L3 16 CW->body:#T 8 1 27
4 moet/verb 4 L5 13 CW->mod:om->body:te->body:#T 5 4 13
5 zich/pron 4 L4 10 NA 5 5 8
6 ben/verb 3 L6 8 CW->body:te->body:#T 4 NA 5
7 dat/comp 3 L7 6 CW->vc:#T 3 6 2
8 het/det 3 R1 3 en->[cnj:#T,cnj:CW] 3 8 2
9 baan/noun 2 R2 3 #T->su:CW 2 7 1
10 fabriek/noun 2 R3 3 ben->vc:aan->[body:#T,su:CW] 2 9 1

Most frequent dependency paths

Figure 44 shows the most frequent dependency paths colored by sense tag. The only paths that seem to occur in at least half the tokens of some sense are the punctuation mark, the direct object and the modifier, which are dispreferred by herstructureren_3.

Figure 44. Tokens per path.

Figure 44. Tokens per path.

Tracking lists

  • nominalizations (10 tokens, mostly from herstructureren_1 but also from the other senses);
  • headlines (9 tokens, from all senses);
  • atypical context (1 token of herstructureren_3 with an atypical object, namely leger);
  • encyclopedic knowledge necessary to disambiguate ((19) and (19), where knowing what NDF, VWS and INDA stand for helps select herstructureren_1);
  • the object is zich(zelf) and variations (10 cases, half annotated as herstructureren_1 and the other half as herstructureren_3, because the example in the original definition was reflexive).
  1. krijgen , gaat er aan de andere kant vanaf . ’ De NDF herstructureerde zich op last van VWS . Die reorganisatie was een geweldig karwei .
  2. Minister van Cultuur Giovanna Melandri onderstreepte in een reactie op de arrestaties dat het Inda in 1998 is geherstructureerd , met onder andere verandering van de voltallige leiding .

Removed tokens

No tokens of herstructureren will be removed.


HERINNEREN

Original senses and annotations

The tokens of herinneren were annotated with 3 senses in 6 batches; the tags in Table 33 were suggested.

Table 33. Original definitions of ‘herinneren’.
Definitions
herinneren_1
(met ‘aan’) weer te binnen brengen, in het geheugen terugroepen: iemand aan iets herinneren
(with aan ‘of’) bring back to the mind, to the memory: remind someone of something
herinneren_2
(reflex.) in het geheugen aanwezig hebben, niet vergeten: zich een gebeurtenis, een persoon herinneren
(reflex.) have present in the memory, not forget: remember an event, a person
herinneren_3
(trans.) met een plechtigheid, monument o.i.d. gedenken: we herinneren vandaag de Slag bij Ronceval
(trans.) remember with a celebration, monument and such: today we remember the Battle of Roncevaux Pass

Figure 45 shows the sense distribution by annotator and batch and Figure 46, that of the disagreements. Figure 47 shows the sense tags that each annotator of each batch assigned to the tokens with herinneren_1 as majority sense, and Figure 48 those for herinneren_2, while herinneren_3 was too infrequent to require a plot.

General distribution

The second sense is always the most frequent and the third one the most infrequent, the latter with rarely any agreement at all. The distribution across annotators within batches is relatively stable, and everyone disagrees in at most 10% of their annotations, except for the first annotator of batch 4, who disagrees in almost 50% of the cases. There are only 2 cases with no agreement at all, both in batch 2; they were both assigned herinneren_3. No tokens had a geen tag as majority sense.

Figure 45. Distribution of senses of 'herinneren' per annotator and batch.

Figure 45. Distribution of senses of ‘herinneren’ per annotator and batch.

Figure 46. Distribution of disagreeing annotations of 'herinneren' per annotator and batch.

Figure 46. Distribution of disagreeing annotations of ‘herinneren’ per annotator and batch.

Disagreement in herinneren_1

This sense covers about 20%-50% of each batch; in each batch one annotator disagrees with a number of annotations, suggesting either herinneren_2 or herinneren_3 as alternative.

Figure 47. Sense annotations of tokens with 'herinneren_1' as majority sense.

Figure 47. Sense annotations of tokens with ‘herinneren_1’ as majority sense.

Disagreement in herinneren_2

This sense covers at least half the tokens of each batch, and once beyond three quarters. There are few disagreeing annotations, with the remarkable outlier of annotator one in batch 4, who suggested herinneren_1 as alternative for about half their annotations.

Figure 48. Sense annotations of tokens with 'herinneren_2' as majority sense.

Figure 48. Sense annotations of tokens with ‘herinneren_2’ as majority sense.

Disagreement in herinneren_3

There are only two tokens with this as majority sense: one in batch 1, with herinneren_1 as alternative, and one in batch 4, with herinneren_2 as alternative. The former actually was retagged as herinneren_1, while the latter does match herinneren_3.

Final senses

One definition, the one for herinneren_3, differs from the original one, based on the actual occurrences of the corpus, so that the final senses are the ones in Table 34. Still, that new reading is extremely infrequent.

Table 34. Final definitions of ‘herinneren’.
code Definition
herinneren_1 (with aan ‘of’) bring back to the mind, to the memory
herinneren_2 (reflex.) have present in the memory, not forget
herinneren_3 (trans.) in the construction “herinnered worden als”, keep in the collective memory

Original versus final sense distribution

Of the 240 tokens of herinneren, 235 kept their original majority senses, 5 were corrected to another original sense, and none were removed.

Table 35 shows in how many tokens with each majority sense which actions were taken, and Figure 49 illustrates the frequency of the final tags. Figure 50 correlates the original majority sense and the final senses.

Figure 49. Final distribution of senses of 'herinneren'.

Figure 49. Final distribution of senses of ‘herinneren’.

Table 35. Cross-tabulation of original majority senses of ‘herinneren’ and actions taken.
original correct majority
herinneren_1 0 75
herinneren_2 2 159
herinneren_3 1 1
no_agreement 2 0
Figure 50. Majority and final senses of 'herinneren'.

Figure 50. Majority and final senses of ‘herinneren’.

Reliable cues

Table 36 shows the most frequent context words selected by the annotators as relevant. Table 37 and Table 38 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags herinneren_1 and herinneren_2.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 10 have no cues that match these criteria. 164 have one single cue and 66 have more than one (up to 6).

Across senses

The cues that distinguish these readings are mostly function words: aan and eraan for herinneren_1 and reflexive pronouns for herinneren_2, which is to be expected given that they are defined by such structures.

Table 36. Frequency of cues by sense, counted by type.
Rank herinneren_1 n herinneren_2 n1 herinneren_3 n2
1 aan/prep 58 zich/pron 94 word/verb 1
2 eraan/pp 11 me/pron 47 0
3 word/verb 3 ik/pron 18 0
4 de/det 2 mij/pron 8 0
5 er/noun 2 kan/verb 5 0
6 me/pron 2 een/det 3 0
7 te/comp 2 hij/pron 3 0
8 waaraan/pp 2 je/pron 3 0
9 bewindsman/noun 1 dat/comp 2 0
10 bij/prep 1 goed/adj 2 0

herinneren_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest two slots to either side of the target, one step away in the dependency path, and mainly as prepositional complement (#T->pc:CW, filled in mostly by aan but also eraan and daaraan).

Table 37. Frequency of context words as cues of herinneren_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 aan/prep 58 R1 36 #T->pc:CW 63 1 85
2 eraan/pp 11 L1 15 #T->mod:CW 8 2 10
3 word/verb 3 R2 13 #T->su:CW 5 3 6
4 de/det 2 L2 10 #T->obj1:CW 4 4 3
5 er/noun 2 L5 5 CW->vc:#T 3 0
6 me/pron 2 L3 4 #T->pc:aan->obj1:CW 2 0
7 te/comp 2 L4 4 CW->body:#T 2 0
8 waaraan/pp 2 R3 4 ->[ROOT:#T,ROOT:procedure->dp:mij->mod:CW] 1 0
9 bewindsman/noun 1 L6 3 ->ROOT:sla_terug->[dp:#T,ROOT:CW] 1 0
10 bij/prep 1 R4 3 #T->mod:aan->obj1:CW 1 0

herinneren_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the first slot to the right of the target, but also up to three slots away, one step away in the dependency path, and mainly as reflexive complement (#T->se:CW), although the subject and some adverbial complements were also selected.

Table 38. Frequency of context words as cues of herinneren_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 zich/pron 94 R1 68 #T->se:CW 148 1 206
2 me/pron 47 R2 34 #T->su:CW 30 2 22
3 ik/pron 18 L1 28 #T->mod:CW 10 3 3
4 mij/pron 8 L2 24 #T->obj1:CW 8 4 1
5 kan/verb 5 L3 15 kan->[vc:#T,su:CW] 6 0
6 een/det 3 L4 12 CW->vc:#T 5 0
7 hij/pron 3 R3 12 CW->body:#T 3 0
8 je/pron 3 L5 9 ->[ROOT:#T,ROOT:CW] 2 0
9 dat/comp 2 L6 7 #T->vc:CW 2 0
10 goed/adj 2 L8 6 hoe->[dp:#T,dp:CW] 2 0

Most frequent dependency paths

Figure 51 shows the most frequent dependency paths colored by sense tag. The subject (#T->su:CW) and reflexive complement (#T->se:CW) are clearly preferred by herinneren_2, while the prepositional complement and its derivations (#T->pc:CW, #T->pc:X->obj1:CW, etc) go with herinneren_1.

Figure 51. Tokens per path.

Figure 51. Tokens per path.

Tracking lists

Only one list was compiled, with one element: (21), which semantically matches the first sense but without the preposition and, as it turns out, a personal pronoun as direct object that was parsed as reflexive complement.

  1. Ik gebruikte ze in Manchester en bracht ze mee naar mijn woonplaats Biarritz om me blijvend te herinneren dat ik een valsspeler ben . " " Op Millars vraag schreef de

Removed tokens

No token of herinneren will be removed.


HERHALEN

Original senses and annotations

The tokens of herhalen were annotated with 3 senses in 8 batches; the tags in Table 39 were suggested.

Table 39. Original definitions of ‘herhalen’.
Definitions
herhalen_1
(trans.) m.b.t. handelingen of activiteiten: opnieuw uitvoeren: een experiment, een les, een bezoek herhalen
(trans.) w.r.t. acts or activities: perform again: repeat an experiment, a lesson, a visit
herhalen_2
(trans.) m.b.t. zinnen, boodschappen e.d.: opnieuw uitspreken: kunt u dat even herhalen?
(trans.) w.r.t. utterances, messages and such: pronounce again: Could you please repeat that?
herhalen_3
(reflex.) zich opnieuw voordoen: de geschiedenis herhaalt zich
(reflex.) occur again: history repeats itself

Figure 52 shows the sense distribution by annotator and batch and Figure 53, that of the disagreements. Figure 54 shows the sense tags that each annotator of each batch assigned to the tokens with herhalen_1 as majority sense, Figure 55 that for herhalen_2 and Figure 56 that for herhalen_3.

General distribution

The sense distribution is relatively stable across and within batches: herhalen_2 is the most frequent reading, and herhalen_3 is quite infrequent. Almost all annotators disagree at some point with their colleagues, on any sense.

9 tokens had no agreement and two with not_listed as majority sense, and they could be assigned herhalen_1 or the new herhalen_4 or were removed.

Figure 52. Distribution of senses of 'herhalen' per annotator and batch.

Figure 52. Distribution of senses of ‘herhalen’ per annotator and batch.

Figure 53. Distribution of disagreeing annotations of 'herhalen' per annotator and batch.

Figure 53. Distribution of disagreeing annotations of ‘herhalen’ per annotator and batch.

Disagreement in herhalen_1

In almost all batches there are a couple of disagreements with herhalen_2, but what jumps out the most are the not_listed annotations of the first annotator of batch 5. Most of these correspond a new herhalen_4 ‘broadcast again’ sense.

Figure 54. Sense annotations of tokens with 'herhalen_1' as majority sense.

Figure 54. Sense annotations of tokens with ‘herhalen_1’ as majority sense.

Disagreement in herhalen_2

This sense covers about 50% of each batch and has some, but not that many, alternative annotations, with any tag as alternative.

Figure 55. Sense annotations of tokens with 'herhalen_2' as majority sense.

Figure 55. Sense annotations of tokens with ‘herhalen_2’ as majority sense.

Disagreement in herhalen_3

This is the least frequent of the senses, covering 5%-20% of each batch. Very few tokens have alternative annotations, and they always correspond to herhalen_1.

Figure 56. Sense annotations of tokens with 'herhalen_3' as majority sense.

Figure 56. Sense annotations of tokens with ‘herhalen_3’ as majority sense.

Final senses

One definition (herhalen_4) was added, based on the actual occurrences of the corpus and the annotators’ suggestions, so that the final senses are the ones in Table 40.

Table 40. Final definitions of ‘herhalen’.
code Definition
herhalen_1 (trans.) w.r.t. acts or activities: perform again
herhalen_2 (trans.) w.r.t. utterances, messages and such: pronounce again
herhalen_3 (reflex.) occur again
herhalen_4 (trans.) of a show or an episode, broadcast again

Original versus final sense distribution

Of the 320 tokens of herhalen, 275 kept their original majority senses, 11 were corrected to another original sense, and 7 were removed. 27 tokens were assigned a new sense.

Table 41 shows in how many tokens with each majority sense which actions were taken, and Figure 57 illustrates the frequency of the final tags. Figure 58 correlates the original majority sense and the final senses.

Figure 57. Final distribution of senses of 'herhalen'.

Figure 57. Final distribution of senses of ‘herhalen’.

Table 41. Cross-tabulation of original majority senses of ‘herhalen’ and actions taken.
original correct majority new remove
herhalen_1 1 80 20 3
herhalen_2 4 159 2 2
herhalen_3 2 36 0 0
no_agreement 4 0 3 2
not_listed 0 0 2 0
Figure 58. Majority and final senses of 'herhalen'.

Figure 58. Majority and final senses of ‘herhalen’.

Reliable cues

Table 42 shows the most frequent context words selected by the annotators as relevant. Table 43, Table 44 and Table 45 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags herhalen_1, herhalen_2 and herhalen_3.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 320 tokens, 52 have no cues that match these criteria. 127 have one single cue and 141 have more than one (up to 7).

Across senses

The strongest profile is that of herhalen_3, which strongly correlates with the reflexive pronoun and its most frequent subject, geschiedenis: for herhalen_1, the passive construction and nouns designating actions/performances are relatively frequent, while for herhalen_2 the subordinating conjuntcion dat, pronouns and speech-related lexemes like woord and standpunt are typical cues.

Table 42. Frequency of cues by sense, counted by type.
Rank herhalen_1 n herhalen_2 n1 herhalen_3 n2 herhalen_4 n3
1 de/det 5 dat/comp 25 zich/pron 30 radio/noun 1
2 prestatie/noun 5 de/det 8 geschiedenis/noun 14 Teleac_serie/noun 1
3 word/verb 5 het/det 8 scenario/noun 4 zend_uit/verb 1
4 handeling/noun 4 hij/pron 8 de/det 2 0
5 actie/noun 3 ik/pron 8 dat/det 1 0
6 dit/det 3 woord/noun 8 discussie/noun 1 0
7 experiment/noun 3 heb/verb 7 dit/det 1 0
8 te/comp 3 standpunt/noun 7 doem_scenario/noun 1 0
9 zijn/det 3 eerder/adj 6 drama/noun 1 0
10 beweging/noun 2 zeg/verb 6 gebeurtenis/noun 1 0

herhalen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 6 slots to the left of the target but also the first two to the right, up to two steps away in the dependency path and mainly as direct object (#T->obj1:CW) but also verb of which the target is a complement (CW->vc:#T, mainly filled by worden) or passive subject (word->[vc:#T,su:CW]) of the target.

The 8 cues beyond the sentence belong to 6 tokens: in all cases, the theme (what is being repeated) is either ellided or referred to by a pronoun within the sentence of the target, but can be extracted from neighboring sentences.

Table 43. Frequency of context words as cues of herhalen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 de/det 5 L1 19 #T->obj1:CW 46 1 68
2 prestatie/noun 5 L2 15 CW->vc:#T 8 2 33
3 word/verb 5 L5 15 NA 8 3 12
4 handeling/noun 4 L3 14 #T->mod:CW 5 4 9
5 actie/noun 3 L6 10 word->[vc:#T,su:CW] 5 NA 8
6 dit/det 3 L4 9 #T->su:CW 3 5 3
7 experiment/noun 3 R2 9 CW->body:#T 3 6 2
8 te/comp 3 R1 8 #T->obj1:en->cnj:CW 2 7 1
9 zijn/det 3 L7 6 #T->obj1:fout->det:CW 2 9 1
10 beweging/noun 2 L8 6 #T->obj1:handeling->det:CW 2 10 1

herhalen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest 3 slots to either side of the target, one or maybe two steps away and mainly as direct object (#T->obj1:CW) but also as verbal complement (#T->vc:CW, mainly filled by dat, but also wat) or subject (#T->su:CW) of the target.

The 7 cues beyond the sentence belong to 5 sentences. In one of them, there is also another (sufficient) cue within the sentence; in two, the same cue occurs inside and outside the sentence and the latter was registered probably because of the known bug. In the other two, the object is a pronoun with a previous clause (of reported speech) as antecedent: the selected cues are part of the reported speech, but don’t help disambiguate beyond that particular relation.

Table 44. Frequency of context words as cues of herhalen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 dat/comp 25 R1 57 #T->obj1:CW 66 1 159
2 de/det 8 R2 40 #T->vc:CW 30 2 80
3 het/det 8 L1 30 #T->su:CW 26 3 46
4 hij/pron 8 R3 26 #T->mod:CW 19 4 19
5 ik/pron 8 L2 22 #T->mod:in->obj1:CW 9 NA 7
6 woord/noun 8 L3 21 CW->vc:#T 8 5 3
7 heb/verb 7 L4 17 #T->vc:wat->body:heb->vc:CW 7 6 2
8 standpunt/noun 7 L5 12 word->[vc:#T,su:CW] 7 0
9 eerder/adj 6 R4 11 NA 7 0
10 zeg/verb 6 R6 10 heb->[vc:#T,su:CW] 5 0

herhalen_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest slot to either side of the target, one or maybe two steps away in the dependency path and mainly as reflexive compelement (#T->se:CW) but also as subject of the target (#T->su:CW).

Table 45. Frequency of context words as cues of herhalen_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 zich/pron 30 R1 17 #T->se:CW 30 1 49
2 geschiedenis/noun 14 L1 10 #T->su:CW 17 2 15
3 scenario/noun 4 L2 9 zal->[vc:#T,su:CW] 3 3 5
4 de/det 2 L3 7 #T->su:geschiedenis->det:CW 2 4 1
5 dat/det 1 L4 6 #T->su:scenario->det:CW 2 5 1
6 discussie/noun 1 R2 6 lijk->vc:te->[body:#T,su:CW] 2 0
7 dit/det 1 R3 5 mag->[vc:#T,su:CW] 2 0
8 doem_scenario/noun 1 L5 3 #T->mod:CW 1 0
9 drama/noun 1 L6 2 #T->su:drama->det:CW 1 0
10 gebeurtenis/noun 1 L7 2 #T->su:geschiedenis->mod:CW 1 0

Most frequent dependency paths

Figure 59 shows the most frequent dependency paths colored by sense tag. The reflexive complement (#T->se:CW) is clearly linked to herhalen_3 and the subject seems to be more frequent with herhalen_2 than with herhalen_1 and herhalen_4, which tend to have modifiers and be used as verbal complement.

Figure 59. Tokens per path.

Figure 59. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (1 token, from herhalen_1);
  • garden-path tokens ((22), of herhalen_1, where geschiedenis is the object of a transitive construction with a different subject instead of the subject of a reflexive one);
  • atypical contexts: (23), of herhalen_1, where the object is missing, and (24), which is in verse;
  • headlines (3 tokens, from herhalen_1 and herhalen_2);
  • tokens with zich in a non reflexive construction: in 5 tokens of herhalen_1, the object is a reflexive pronoun, and what is being repeated is someone’s artistic performance, with the added nuance of lack of creativity.
  1. in Amersfoort . ’ Wie de geschiedenis niet kent is gedwongen haar te herhalen ’ , is de algemene wijsheid . De IRA en de protestanten ,
  2. goed dat het cultuurseizoen op zijn eind loopt . De kunstkletsprogramma’s zwijgen of herhalen , de laatste prijzen zijn uitgereikt , de recensenten gaan op vakantie .
  3. ’ Nu ik op dit filmpje van taal / het gebeuren voor je herhaal , ’ schreef hij al in 1968 in ’ Landschap voor een dode meneer ’ .

Removed tokens

7 tokens will be removed: one because it is a duplicate of another token, and the rest because there is not enough context to distinguish between herhalen_1 and herhalen_2.


HELPEN

Original senses and annotations

The tokens of helpen were annotated with 3 senses in 6 batches; the tags in Table 46 were suggested.

Table 46. Original definitions of ‘helpen’.
Definitions
helpen_1
(trans.) ondersteunen in materiële of morele zin, bijstaan: met raad en daad helpen, een helpende hand, uit de nood helpen
(trans.) support in material or moral sense, assist: help in word and deed, a helping hand, help out
helpen_2
(trans.) iem. assisteren door met hem samen te werken: helpen met het huiswerk; heb je dat alleen gedaan of heeft iemand je geholpen?
(trans.) assist someone by collaborating with them: help with homework, did you do that by yourself or did someone help you?
helpen_3
(intrans.) voordeel opleveren, nuttig zijn: dat drankje heeft goed geholpen
(intrans.) yield advantage, be useful: that drink helped a lot

Figure 60 shows the sense distribution by annotator and batch and Figure 61, that of the disagreements. Figure 62 shows the sense tags that each annotator of each batch assigned to the tokens with helpen_1 as majority sense, Figure 63 those for helpen_2 and Figure 64 for helpen_3.

General distribution

The sense distribution both across and within batches is quite unstable, with roughly helpen_1 as the most frequent and helpen_2 as the least frequent. Every annotator disagrees in about 25% of their annotations, mostly in tokens with helpen_1 or helpen_3 as majority sense.

15 tokens had no agreement, but all but one (which was removed) could be assigned a tag. The 10 tokens with not_listed as majority sense were either assigned a new tag or removed, and the one with wrong_lemma as majority sense was removed.

Figure 60. Distribution of senses of 'helpen' per annotator and batch.

Figure 60. Distribution of senses of ‘helpen’ per annotator and batch.

Figure 61. Distribution of disagreeing annotations of 'helpen' per annotator and batch.

Figure 61. Distribution of disagreeing annotations of ‘helpen’ per annotator and batch.

Disagreement in helpen_1

This sense covers about 30%-60% of each batch, with a number of cases of helpen_2 as alternative, or for annotator 1 of batch 3, not_listed.

Figure 62. Sense annotations of tokens with 'helpen_1' as majority sense.

Figure 62. Sense annotations of tokens with ‘helpen_1’ as majority sense.

Disagreement in helpen_2

This sense covers about 15%-25% of each batch, with a number of dissenting annotations with helpen_1 as alternative, especially from annotator 1 of batch 3.

Figure 63. Sense annotations of tokens with 'helpen_2' as majority sense.

Figure 63. Sense annotations of tokens with ‘helpen_2’ as majority sense.

Disagreement in helpen_3

This sense covers about 12%-40% of each batch, with some disagreeing annotations suggesting mostly helpen_1 but also helpen_2 and not_listed as alternatives.

Figure 64. Sense annotations of tokens with 'helpen_3' as majority sense.

Figure 64. Sense annotations of tokens with ‘helpen_3’ as majority sense.

Final senses

After the annotation, the definitions of helpen changed: helpen_4 and helpen_5 were added to gather an intransitive construction similar to helpen_1 but with inanimate entities and a construction with aan meaning “to provide”, respectively. While 21 tokens (mostly of helpen_1) instantiate a resultative construction with a preposition or adverb, the aan case warrants its own category because of its frequency and the tendency of the annotators to suggest a separate sense for them. The final definitions are shown in Table 47.

Table 47. Final definitions of ‘helpen’.
code Definition
helpen_1 (trans.) support in material or moral sense, assist
helpen_2 (trans.) assist someone by collaborating with them
helpen_3 (intrans.) yield advantage, be useful
helpen_4 (trans.) with inanimate entities, be helpful, useful
helpen_5 (with aan) provide

In addition, one idiom was identified that cannot be subdued to any of the other senses, namely “om zeep helpen” ‘to kill’. There are 7 tokens belonging to this category.

Original versus final sense distribution

Of the 240 tokens of helpen, 143 kept their original majority senses, 39 were corrected to another original sense, and 7 were removed. 38 tokens were assigned a new sense; 13 tokens were identified as instances of some idiomatic expression.

Table 48 shows in how many tokens with each majority sense which actions were taken, and Figure 65 illustrates the frequency of the final tags. Figure 66 correlates the original majority sense and the final senses.

Figure 65. Final distribution of senses of 'helpen'.

Figure 65. Final distribution of senses of ‘helpen’.

Table 48. Cross-tabulation of original majority senses of ‘helpen’ and actions taken.
original correct idiom majority new remove
helpen_1 14 1 60 17 4
helpen_2 7 0 43 4 1
helpen_3 8 5 40 10 0
no_agreement 10 0 0 4 1
not_listed 0 7 0 3 0
wrong_lemma 0 0 0 0 1
Figure 66. Majority and final senses of 'helpen'.

Figure 66. Majority and final senses of ‘helpen’.

Reliable cues

Table 49 shows the most frequent context words selected by the annotators as relevant. Table 50, Table 51 and Table 52 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags helpen_1, helpen_2 and helpen_3.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 100 have no cues that match these criteria. 50 have one single cue and 90 have more than one (up to 10).

Across senses

The most frequent cues for these senses are not very frequent: in helpen_1, the te complement stands out, while het helpt (niet) seems to be quite typical for helpen_3.

Table 49. Frequency of cues by sense, counted by type.
Rank helpen_1 n helpen_2 n1 helpen_3 n2 helpen_5 n3 om zeep helpen n4 remove n5
1 te/comp 10 bij/prep 4 het/det 11 aan/prep 3 om/fixed 2 !/punct 1
2 een/det 5 hem/pron 3 niet/adv 8 een/det 2 zeep/fixed 2 de/det 1
3 om/comp 5 te/comp 3 zal/verb 5 bruid/noun 1 om/adj 1 het/noun 1
4 ons/pron 5 commissaris/noun 2 dat/det 4 goed/adj 1 zeep/noun 1 kan/verb 1
5 de/det 4 deze/det 2 niets/noun 3 hij/pron 1 0 niet/adv 1
6 mens/noun 4 Europa/name 2 alleen/adv 2 kaart/noun 1 0 uit/prep 1
7 ik/pron 3 het/det 2 bij/prep 2 rood/adj 1 0 wereld/noun 1
8 met/prep 3 ik/pron 2 de/det 2 0 0 0
9 bovenop/prep 2 met/prep 2 aanpak/noun 1 0 0 0
10 familie_lid/noun 2 moet/verb 2 aanschaf/noun 1 0 0 0

helpen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest four slots to the left of the target, one or maybe two steps away in the dependency path, mainly as direct object (#T->obj1:CW) but also complementizer (CW->body:#T, mostly filled by te).

The five cues beyond the sentence belong to two tokens: in one case, the context words inside the sentence are indeed not that informative, but in the other one there are also enough cues within the sentence.

Table 50. Frequency of context words as cues of helpen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 te/comp 10 L2 29 #T->obj1:CW 25 1 62
2 een/det 5 L1 21 CW->body:#T 10 2 39
3 om/comp 5 L3 16 #T->mod:CW 8 3 18
4 ons/pron 5 L4 14 NA 5 4 14
5 de/det 4 R1 7 #T->ld:CW 4 5 7
6 mens/noun 4 R2 7 #T->pc:CW 4 NA 5
7 ik/pron 3 L5 6 #T->su:CW 4 6 2
8 met/prep 3 R3 6 CW->body:te->body:#T 4 7 2
9 bovenop/prep 2 L10 4 #T->obj1:en->cnj:CW 3 14 1
10 familie_lid/noun 2 L13 4 #T->pc:met->obj1:CW 3 0

helpen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest slot to either side of the target, one or maybe two steps away in the dependency path, mainly as direct object (#T->obj1:CW) or verb complement (#T->vc:CW, te in “helpen te bevrijden”, bevrijden in “helpen bevrijden”).

Table 51. Frequency of context words as cues of helpen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 bij/prep 4 R1 20 #T->obj1:CW 16 1 50
2 hem/pron 3 L1 12 #T->vc:CW 13 2 28
3 te/comp 3 L2 10 #T->su:CW 9 3 14
4 commissaris/noun 2 R3 9 #T->pc:CW 6 4 9
5 deze/det 2 L3 6 #T->pc:bij->obj1:CW 5 5 3
6 Europa/name 2 R2 6 CW->vc:#T 5 0
7 het/det 2 R5 6 #T->vc:te->body:CW 3 0
8 ik/pron 2 R4 5 moet->[vc:#T,su:CW] 2 0
9 met/prep 2 L5 4 #T->ld:in->obj1:CW 1 0
10 moet/verb 2 L6 4 #T->ld:in->obj1:eenmanszaak->mod:van->obj1:CW 1 0

helpen_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the first slot to either side of the target, one or maybe two steps away in the dependency path, mainly as subject (#T->su:CW) or modifier (#T->mod:CW, mostly niet) of the target. The one context word outside the sentence occurs both inside and outside.

Table 52. Frequency of context words as cues of helpen_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 het/det 11 L1 24 #T->su:CW 19 1 50
2 niet/adv 8 R1 13 #T->mod:CW 14 2 14
3 zal/verb 5 L3 10 #T->obj1:CW 6 3 9
4 dat/det 4 L2 7 CW->vc:#T 5 4 8
5 niets/noun 3 L4 6 #T->pc:CW 2 5 2
6 alleen/adv 2 L5 5 #T->su:Miracle->mod:samengesteld->pc:uit->obj1:en->cnj:CW 2 NA 1
7 bij/prep 2 R2 5 heb->[vc:#T,su:of->cnj:aanpak->mod:CW] 2 0
8 de/det 2 R3 5 zal->[vc:#T,su:CW] 2 0
9 aanpak/noun 1 L8 2 ->ROOT:op->dp:zal->[vc:#T,ROOT:CW] 1 0
10 aanschaf/noun 1 R4 2 #T->ld:CW 1 0

Most frequent dependency paths

Figure 67 shows the most frequent dependency paths colored by sense tag. The top two paths are almost exclusive of om zeep helpen and indicate a fixed expression; except for the subject, which is frequent for helpen_3, all of these paths are quite frequent in tokens of helpen_5; direct object is mostly present in helpen_1, and helpen_4, but also helpen_2, and modifiers are fairly frequent as well.

Figure 67. Tokens per path.

Figure 67. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • headlines (8 tokens, mostly from helpen_1);
  • special collocation (3 tokens of helpen_2 with een haandje)
  • resultative construction, with a preposition or adverb, such as vooruit helpen, bovenop helpen, uit iets helpen (21 tokens, mostly of helpen_1 but also helpen_2 and helpen_4).

Removed tokens

7 tokens will be removed because they instantiate very infrequent senses or idiomatic expressions or, in two cases, because they could equally refer to helpen_1 or helpen_2. The latter cases could be included in some models to see if they are modelled in an intermediate position, but they are very rare, it might be not worth pursuing.


HARDEN

Original senses and annotations

The tokens of harden were annotated with 5 senses in 8 batches; the tags in Table 53 were suggested.

Table 53. Original definitions of ‘harden’.
Definitions
harden_1
(trans.) hard maken, in letterlijke zin: staal harden
(trans.) make hard, in literal sense: harden steel
harden_2
(intrans.) hard worden, in letterlijke zin: snel hardende verven
(intr.) become hard, in literal sense: quickly hardening paint
harden_3
(trans.) hard maken in figuurlijke zin; weerstand en veerkracht bijbrengen: een kind harden tegen het klimaat
(trans.) make hard in figurative sense; impart resistance and resilience: toughen a child against the weather
harden_4
(reflex.) bij zichzelf weerstand en veerkracht aankweken: zich harden tegen het lot
(reflex.) develop resistance and resilience by oneself: toughen oneself against fate
harden_5
(trans.) uithouden, verdragen: niet te harden
(trans.) endure, tolerate: unbearable (‘not to bear’)

Figure 68 shows the sense distribution by annotator and batch and Figure 69, that of the disagreements. Figure 70 shows the sense tags that each annotator of each batch assigned to the tokens with harden_2 as majority sense, Figure 71 those for harden_3, Figure 72 for harden_4 and Figure 73 for harden_5. harden_1 is too infrequent to require a plot.

General distribution

The fifth sense is by far the most frequent in all the batches, followed by harden_3. The rest of the senses are quite infrequent; even the wrong_lemma tag is more frequent than them in some cases. That said, there is little disagreement, focused on tokens with harden_3 or wrong_lemma as majority sense.

There is only one token with no agreement, which is an instance of the adjective hard, and 34 (10.62% of the tokens) with wrong_lemma as majority sense, which are instances of surnames or hard as an adjective or adverb (Table 54).

Figure 68. Distribution of senses of 'harden' per annotator and batch.

Figure 68. Distribution of senses of ‘harden’ per annotator and batch.

Figure 69. Distribution of disagreeing annotations of 'harden' per annotator and batch.

Figure 69. Distribution of disagreeing annotations of ‘harden’ per annotator and batch.

Disagreement in harden_1

There are only four tokens with harden_1 as majority sense: one in batch 1 has full agreement, but the other three, in batches 1, 2 and 5, have harden_2 as alternative.

Disagreement in harden_2

There are 0 to 3 tokens per batch of this sense, 5 of which have harden_1 as alternative. 5 of them are actually instances of uitharden.

Figure 70. Sense annotations of tokens with 'harden_2' as majority sense.

Figure 70. Sense annotations of tokens with ‘harden_2’ as majority sense.

Disagreement in harden_3

This sense covers about 10%-30% of each batch, with quite some disagreement. In many cases harden_4 is an alternative annotation, but sometimes other tags as well.

Figure 71. Sense annotations of tokens with 'harden_3' as majority sense.

Figure 71. Sense annotations of tokens with ‘harden_3’ as majority sense.

Disagreement in harden_4

This sense covers 0-3 tokens per batch, in 3 cases with harden_3 as alternative; two of those are rather instances of harden_3.

Figure 72. Sense annotations of tokens with 'harden_4' as majority sense.

Figure 72. Sense annotations of tokens with ‘harden_4’ as majority sense.

Disagreement in harden_5

This sense covers about 50%-75% of each batch, with virtually no disagreement.

Figure 73. Sense annotations of tokens with 'harden_5' as majority sense.

Figure 73. Sense annotations of tokens with ‘harden_5’ as majority sense.

Final senses

The final definitions are the same as the original definitions: no (sub)senses were added or modified.

Original versus final sense distribution

Of the 320 tokens of harden, 275 kept their original majority senses, 4 were corrected to another original sense, and 41 were removed.

Table 55 shows in how many tokens with each majority sense which actions were taken, and Figure 74 illustrates the frequency of the final tags. Figure 75 correlates the original majority sense and the final senses.

Figure 74. Final distribution of senses of 'harden'.

Figure 74. Final distribution of senses of ‘harden’.

Table 55. Cross-tabulation of original majority senses of ‘harden’ and actions taken.
original correct majority remove
harden_1 1 3 0
harden_2 0 9 5
harden_3 1 63 1
harden_4 2 9 0
harden_5 0 191 0
no_agreement 0 0 1
wrong_lemma 0 0 34
Figure 75. Majority and final senses of 'harden'.

Figure 75. Majority and final senses of ‘harden’.

Reliable cues

Table 56 shows the most frequent context words selected by the annotators as relevant. Table 57, Table 58 and Table 59 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags harden_3, harden_4 and harden_5. harden_1 and harden_2 won’t be shown because they are too infrequent: the highest ranked cue based on any attribute has a frequency of 4.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 320 tokens, 23 have no cues that match these criteria. 64 have one single cue and 233 have more than one (up to 9).

Across senses

Most of the senses are too infrequent to have stable frequent cues: for harden_5, te and niet are of course the main cues, but also frequent themes (things that cannot be tolerated), such as pijn and stank. The reflexive pronoun is a relatively frequent cue for the reflexive reading, harden_4, and zijn seems relatively frequent for harden_3.

Table 56. Frequency of cues by sense, counted by type.
Rank harden_1 n harden_2 n1 harden_3 n2 harden_4 n3 harden_5 n4 remove n5
1 draai/verb 1 laat/verb 2 ben/verb 10 zich/pron 7 te/comp 171 lab_euro/noun 4
2 gebruik/verb 1 beton/noun 1 door/prep 9 ge/pron 1 niet/adv 153 werk/verb 3
3 hand_vat/noun 1 droog/verb 1 heb/verb 6 hij/pron 1 pijn/noun 41 gewerkt/adj 2
4 het/det 1 gips/noun 1 hij/pron 4 pantser/noun 1 stank/noun 35 Amerikaans/adj 1
5 huid/noun 1 golfplaat/noun 1 me/pron 4 uzelf/pron 1 meer/adv 32 ben/verb 1
6 oppervlak/noun 1 grit/noun 1 mentaal/adj 4 zal/verb 1 hitte/noun 9 bewijs/noun 1
7 slijp/verb 1 kassei/noun 1 dat/det 3 zichzelf/pron 1 nauwelijks/adv 9 bezig/adj 1
8 voetzool/noun 1 lijm/noun 1 het/det 3 0 lawaai/noun 7 d/noun 1
9 workshop/noun 1 plateau/noun 1 in/prep 3 0 geur/noun 5 Dick/name 1
10 0 rij_over/verb 1 leven/noun 3 0 ben/verb 4 dreigend/adj 1

harden_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest three slots to the left of the target and the first to the right, up to two steps away in the dependency path, mostly as modifier (#T->mod:CW), direct object (#T->obj1:CW) or verb of which the target is a complement (CW->vc:#T).

Table 57. Frequency of context words as cues of harden_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 ben/verb 10 L1 22 #T->mod:CW 19 1 55
2 door/prep 9 L2 18 #T->obj1:CW 17 2 42
3 heb/verb 6 L3 16 CW->vc:#T 15 3 12
4 hij/pron 4 R1 12 #T->mod:door->obj1:CW 10 4 11
5 me/pron 4 L4 10 heb->[vc:#T,su:CW] 8 6 2
6 mentaal/adj 4 R3 10 ben->[vc:#T,su:CW] 6 7 2
7 dat/det 3 L5 8 #T->mod:in->obj1:CW 4 NA 2
8 het/det 3 R2 8 #T->su:CW 4 5 1
9 in/prep 3 R4 7 #T->mod:tegen->obj1:CW 3 0
10 leven/noun 3 L10 3 en->[cnj:#T,cnj:CW] 3 0

harden_4

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest two slots to either side of the target, one step away in the dependency path, mainly as direct object (#T->obj1:CW) of the target: the parser has not recognized the reflexive pronoun as a reflexive complement.

Table 58. Frequency of context words as cues of harden_4 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 zich/pron 7 L2 3 #T->obj1:CW 8 1 8
2 ge/pron 1 R1 3 CW->vc:en->cnj:#T 1 2 2
3 hij/pron 1 L1 2 en->[cnj:#T,cnj:vorm->obj1:CW] 1 3 2
4 pantser/noun 1 L3 1 moet->[vc:#T,su:CW] 1 NA 1
5 uzelf/pron 1 L4 1 zal->vc:en->[cnj:#T,su:CW] 1 0
6 zal/verb 1 L7 1 NA 1 0
7 zichzelf/pron 1 L8 1 0 0
8 0 R8 1 0 0
9 0 0 0 0
10 0 0 0 0

harden_5

Next to the list of types that were selected as cues, we can see that they mostly occur in the first two slots to the left of the target and one step away in the dependency path, mainly as modifier (#T->mod:CW, filled mostly by niet but also nauwelijks, amper…) or complementizer on which the target depends on (CW->body:#T, filled by te). ben->vc:te->[body:#T,su:CW], which should actually be expressed by ben->[vc:te->body:#T,su:CW], links pijn, stank and other objects (which would be objects of harden but are subjects of zijn).

Table 59. Frequency of context words as cues of harden_5 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 te/comp 171 L1 173 CW->body:#T 159 1 339
2 niet/adv 153 L2 145 #T->mod:CW 157 3 113
3 pijn/noun 41 L3 85 ben->vc:te->[body:#T,su:CW] 86 2 58
4 stank/noun 35 L4 47 #T->mod:niet->mod:CW 30 4 21
5 meer/adv 32 L5 26 #T->obj1:CW 16 NA 11
6 hitte/noun 9 L6 15 NA 11 5 7
7 nauwelijks/adv 9 R1 13 CW->mod:te->body:#T 8 6 2
8 lawaai/noun 7 L7 9 ben->vc:te->[body:#T,su:en->cnj:CW] 5 7 2
9 geur/noun 5 L8 9 CW->vc:te->body:#T 5 8 1
10 ben/verb 4 L9 8 CW->vc:#T 4 10 1

Most frequent dependency paths

Figure 76 shows the most frequent dependency paths colored by sense tag. The passive construction seems to be of preference for harden_3, while the complementizers on which the target depends on (CW->body:#T) and its extensions are typical of harden_5. The rest of the senses are too infrequent.

Figure 76. Tokens per path.

Figure 76. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalization (2 tokens of harden_1 and harden_2);
  • atypical context: (25) and (26), of harden_1 and harden_3 respectively. In the former, there seems to be some connection missing between workshops and the string of related verbs; in the former, the collocation with nationaal is strange.
  1. . Tijd voor een tweedaagse International Tool Conference met workshops scharen slijpen , harden , handvaten draaien voor beitels , fondswerving en samenwerking van werkplaatsen . Deelnemers
  2. Op de bank zit weliswaar genoeg talent , maar dat is alleen nationaal gehard . Dat scheelt veel met de internationale maatstaf . ’ Daarom

Removed tokens

41 will be removed because they are instances of a surname, uitharden or hard as adjective or adverb.


HERSTELLEN

Original senses and annotations

The tokens of herstellen were annotated with 5 senses in 6 batches; the tags in Table 60 were suggested.

Table 60. Original definitions of ‘herstellen’.
Definitions
herstellen_1
(trans.) repareren, de eraan ontstane schade wegwerken: het dak herstellen
(trans.) repair, get rid of the damage in something: repair the roof
herstellen_2
(trans.) tot de vorige toestand terugbrengen, doen terugkeren: de goede verstandhouding herstellen
(trans.) bring back, make return to the previous state: repair the understanding
herstellen_3
(trans.) goedmaken, weer doen vergeten: een fout herstellen
(trans.) make good, make forget: fix a mistake
herstellen_4
(reflex.) tot de oorspronkelijke toestand terugkeren: de rust herstelt zich
(reflex.) return to the original state: peace is restored
herstellen_5
(intrans.) genezen: van een ziekte herstellen
(intrans.) heal: heal from a disease

Figure 77 shows the sense distribution by annotator and batch and Figure 78, that of the disagreements. Figure 79 shows the sense tags that each annotator of each batch assigned to the tokens with herstellen_1 as majority sense, Figure 80 those for herstellen_2, Figure 81 for herstellen_3, Figure 82 for herstellen_4 and Figure 83 for herstellen_5.

General distribution

The sense distribution varies slightly between batches, but is relatively stable between annotators of the same batch. herstellen_3 is constantly the least frequent, and herstellen_1 is much more frequent in the last two batches than in the other four, while herstellen_4 presents the opposite behaviour and herstellen_2 and herstellen_5 keep a decent frequency in all batches. There is some disagreement, mostly in tokens with herstellen_2 as majority sense and especially from annotator 2 of batch 3, who disagrees with the majority in half their annotations. In this batch there is also the largest amount of tokens with no agreement, although there are some in all batches. All 12 tokens with no agreement could be assigned a sense, mostly herstellen_2.

Figure 77. Distribution of senses of 'herstellen' per annotator and batch.

Figure 77. Distribution of senses of ‘herstellen’ per annotator and batch.

Figure 78. Distribution of disagreeing annotations of 'herstellen' per annotator and batch.

Figure 78. Distribution of disagreeing annotations of ‘herstellen’ per annotator and batch.

Disagreement in herstellen_1

This sense covers up to 10% of the first four batches, almost with full agreement, and about 30% of the other two, with some occasional alternative annotations of almost any other sense (never of herstellen_4, the reflexive one).

Figure 79. Sense annotations of tokens with 'herstellen_1' as majority sense.

Figure 79. Sense annotations of tokens with ‘herstellen_1’ as majority sense.

Disagreement in herstellen_2

This sense covers 20%-50% of each batch, but with a number of alternative annotations, mostly of herstellen_3.

Figure 80. Sense annotations of tokens with 'herstellen_2' as majority sense.

Figure 80. Sense annotations of tokens with ‘herstellen_2’ as majority sense.

Disagreement in herstellen_3

This sense is attested in 1-3 tokens per batch, mostly with one of the other transitive readings as alternative.

Figure 81. Sense annotations of tokens with 'herstellen_3' as majority sense.

Figure 81. Sense annotations of tokens with ‘herstellen_3’ as majority sense.

Disagreement in herstellen_4

This sense covers 2-9 tokens per batch, although with a number of herstellen_3 alternative annotations, especially from two specific annotators.

Figure 82. Sense annotations of tokens with 'herstellen_4' as majority sense.

Figure 82. Sense annotations of tokens with ‘herstellen_4’ as majority sense.

Disagreement in herstellen_5

This sense covers 10%-30% ofe each batch, with a small number of alternative annotations.

Figure 83. Sense annotations of tokens with 'herstellen_5' as majority sense.

Figure 83. Sense annotations of tokens with ‘herstellen_5’ as majority sense.

Final senses

One definition was added, based on the actual occurrences of the corpus, so that the final senses are the ones in Table 61. The question is open whether herstellen_6 is a figurative extension of herstellen_5, with financial entities as subjects instead of people, or an intransitive variation from herstellen_2, or the middle point where both meet. Only one annotator suggested this as a separate sense.

Table 61. Final definitions of ‘herstellen’.
code Definition
herstellen_1 (trans.) repair, get rid of the damage in something
herstellen_2 (trans.) bring back, make return to the previous state
herstellen_3 (trans.) make good, make forget
herstellen_4 (reflex.) return to the original state
herstellen_5 (intrans.) heal
herstellen_6 (intrans.) of a financial/economic entity, recover

Original versus final sense distribution

Of the 240 tokens of herstellen, 207 kept their original majority senses, 25 were corrected to another original sense, and 1 was removed. 7 tokens were assigned a new sense.

Table 62 shows in how many tokens with each majority sense which actions were taken, and Figure 84 illustrates the frequency of the final tags. Figure 85 correlates the original majority sense and the final senses.

Figure 84. Final distribution of senses of 'herstellen'.

Figure 84. Final distribution of senses of ‘herstellen’.

Table 62. Cross-tabulation of original majority senses of ‘herstellen’ and actions taken.
original correct majority new remove
herstellen_1 7 32 0 0
herstellen_2 1 77 6 0
herstellen_3 2 9 0 0
herstellen_4 0 34 1 0
herstellen_5 3 55 0 1
no_agreement 12 0 0 0
Figure 85. Majority and final senses of 'herstellen'.

Figure 85. Majority and final senses of ‘herstellen’.

Reliable cues

Table 63 shows the most frequent context words selected by the annotators as relevant. Table 64, Table 65, Table 66 and Table 67 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags herstellen_1, herstellen_2, herstellen_4 and herstellen_5. hertellen_3 won’t be shown because it is too infrequent: the highest ranked cue based on any attribute has a frequency of 4.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 240 tokens, 36 have no cues that match these criteria. 95 have one single cue and 109 have more than one (up to 5).

Across senses

The most clear profiles based on lemma-pos combination are that of herstellen_2, with in ere and evenwicht as frequent representative cues, the reflexive reading herstellen_4 with zich and herstellen_5 with van, the preposition that introduces the damage or disease someone is healing from. The cues for herstellen_1 and herstellen_2 are quite infrequent.

Table 63. Frequency of cues by sense, counted by type.
Rank herstellen_1 n herstellen_2 n1 herstellen_3 n2 herstellen_4 n3 herstellen_5 n4
1 het/det 3 ere/noun 10 dat/det 2 zich/pron 31 van/prep 15
2 de/det 2 evenwicht/noun 10 bilateraal/adj 1 de/det 3 een/det 8
3 electrisch/adj 2 in/prep 10 euvel/noun 1 ons/pron 2 ben/verb 4
4 en/vg 2 oorspronkelijk/adj 5 fout/noun 1 situatie/noun 2 blessure/noun 4
5 fiets/noun 2 orde/noun 5 fout_DIM/noun 1 te/comp 2 hij/pron 4
6 leiding/noun 2 contact/noun 4 kwaad/noun 1 daarna/pp 1 te/comp 4
7 word/verb 2 het/det 4 miskleun/noun 1 economie/noun 1 ziekte/noun 3
8 aanvezeling/noun 1 veiligheid/noun 4 misser/noun 1 fonds/noun 1 de/det 2
9 appartement/noun 1 vertrouwen/noun 4 onrecht/noun 1 golfer/noun 1 knie_blessure/noun 2
10 balk/noun 1 democratie/noun 2 probleem/noun 1 heb/verb 1 kwetsuur/noun 2

herstellen_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the first two slots to the left of the target, up to two steps away in the dependency path, mainly as direct object (#T->obj1:CW) of the target. The 11 cues outside the sentence belong to 6 tokens; in one of them, the theme is a pronoun with the antecedent in the previous sentence, but in the rest, there are enough cues inside the sentence or in any case those outside don’t contribute that much.

Table 64. Frequency of context words as cues of herstellen_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 het/det 3 L2 10 NA 11 1 18
2 de/det 2 L1 9 #T->obj1:CW 10 2 17
3 electrisch/adj 2 L3 6 #T->mod:CW 4 3 12
4 en/vg 2 R1 5 word->[vc:#T,su:CW] 4 NA 11
5 fiets/noun 2 L4 4 #T->obj1:en->cnj:CW 3 4 5
6 leiding/noun 2 L5 4 #T->mod:van->obj1:CW 2 6 2
7 word/verb 2 R2 4 #T->mod:van->obj1:en->cnj:CW 2 5 1
8 aanvezeling/noun 1 R3 4 CW->vc:#T 2 9 1
9 appartement/noun 1 L9 3 en->[cnj:#T,cnj:CW] 2 0
10 balk/noun 1 R4 3 ->[ROOT:#T,ROOT:ben->vc:begin->pc:met->obj1:CW] 1 0

herstellen_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest three slots to the target, one or maybe two steps away in the dependency path, mainly as direct object (#T->obj1:CW) of the target.

Table 65. Frequency of context words as cues of herstellen_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 ere/noun 10 L2 34 #T->obj1:CW 46 1 63
2 evenwicht/noun 10 L1 19 #T->pc:in->obj1:CW 15 2 34
3 in/prep 10 L3 16 #T->pc:CW 10 3 17
4 oorspronkelijk/adj 5 L4 11 CW->vc:#T 4 4 6
5 orde/noun 5 L5 9 word->[vc:#T,su:CW] 3 5 2
6 contact/noun 4 R2 6 #T->mod:van->obj1:CW 2 6 1
7 het/det 4 R3 6 #T->obj1:evenwicht->det:CW 2 NA 1
8 veiligheid/noun 4 L9 5 #T->pc:in->obj1:toestand->mod:CW 2 0
9 vertrouwen/noun 4 L6 4 ben->[vc:#T,su:CW] 2 0
10 democratie/noun 2 R4 3 word->[vc:#T,su:en->cnj:CW] 2 0

herstellen_4

Next to the list of types that were selected as cues, we can see that they mostly occur in the first slot to the right of the target, one step away in the dependency path, mainly as reflexive complement (#T->se:CW).

Table 66. Frequency of context words as cues of herstellen_4 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 zich/pron 31 R1 12 #T->se:CW 34 1 46
2 de/det 3 L3 9 #T->su:CW 8 2 4
3 ons/pron 2 L1 8 CW->body:#T 2 3 2
4 situatie/noun 2 L2 7 #T->mod:CW 1 0
5 te/comp 2 R2 4 #T->su:fonds->det:CW 1 0
6 daarna/pp 1 L5 2 #T->su:golfer->det:CW 1 0
7 economie/noun 1 R3 2 #T->su:ploeg->det:CW 1 0
8 fonds/noun 1 L10 1 #T->su:situatie->det:CW 1 0
9 golfer/noun 1 L11 1 ben->vc:aan->[body:#T,su:CW] 1 0
10 heb/verb 1 L12 1 CW->vc:#T 1 0

herstellen_5

Next to the list of types that were selected as cues, we can see that they mostly occur in the closes three slots to the right of the token, up to three steps away in the dependency path, mainly as modifier (#T->mod:CW, mostly filled by van) or object linked through van as either modifier or prepositional complement (#T->mod:van->obj1:CW, #T->pc:van->obj1:CW).

Table 67. Frequency of context words as cues of herstellen_5 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 van/prep 15 R3 20 #T->mod:CW 17 1 39
2 een/det 8 R2 19 #T->mod:van->obj1:CW 13 2 37
3 ben/verb 4 R1 16 #T->pc:van->obj1:CW 13 3 27
4 blessure/noun 4 L1 13 #T->pc:CW 9 4 8
5 hij/pron 4 R4 13 #T->su:CW 6 NA 4
6 te/comp 4 L2 9 #T->pc:van->obj1:en->cnj:CW 4 5 3
7 ziekte/noun 3 L3 6 ben->[vc:#T,su:CW] 4 6 2
8 de/det 2 R6 6 CW->vc:#T 4 8 2
9 knie_blessure/noun 2 L4 4 NA 4 9 1
10 kwetsuur/noun 2 R5 4 CW->body:#T 3 10 1

Most frequent dependency paths

Figure 86 shows the most frequent dependency paths colored by sense tag. The reflexive complement is clearly exclusive of herstellen_4, direct objects tend to go for herstellen_2 and modifiers are fairly frequent.

Figure 86. Tokens per path.

Figure 86. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (9 tokens);
  • headlines (2 tokens of herstellen_1 and herstellen_2);
  • atypical contexts (2 tokens of herstellen_1 without explicit object);
  • special cases (3 tokens of herstellen_3 with a material object such as schade, that could be expected to wound up between herstellen_1 and herstellen_3).

Removed tokens

1 token will be removed because it is a duplicate of another token.


HAKEN

Original senses and annotations

The tokens of haken were annotated with 5 senses in 9 batches; the tags in Table 68 were suggested.

Table 68. Original definitions of ‘haken’.
Definitions
haken_1
(trans.) met of als met een haak vastmaken (aan, in, achter iets): een wagen aan een locomotief haken, een sleutel in een ring haken
(trans.) fix something with or as if with a hook (at, to, behind something): hook a wagon to a locomotive, a key in a key ring
haken_2
(intrans.) met of als met een haak vastraken: de doornen haakten aan haar jas, haar paraplu bleef haken aan de deurknop
(intrans.) get stuck with or as if with a hook: the thorns got stuck in her coat, her umbrella got stuck in the doorknob
haken_3
(trans.) over een uitgestoken been doen struikelen: hij werd gehaakt in de elfmeter, iemand pootje haken
(trans.) make trip over a stuck out leg: he was made to trip in the penalty kick, make someone trip
haken_4
(intrans., met ‘blijven’) van gedachten, blikken e.d.: haperen, telkens terugkeren (aan of bij iets): ik bleef haken bij de herinnering aan mijn broer
(intrans., with blijven ‘keep’) of thoughts, gazes and such: falter, come back (to something): I kept going back to the memory of my brother
haken_5
(intrans./trans.) zeker handwerk maken door met een staafje met een weerhaak lussen samen te weven: haken tijdens het televisiekijken, hoe ontspannend!, een babymutsje haken
(intrans./trans.) make handcraft by weaving loops together with a hooked needle: crochetting while watching tv, so relaxing!, crochet a baby hat

Figure 87 shows the sense distribution by annotator and batch and Figure 88, that of the disagreements. Figure 89 shows the sense tags that each annotator of each batch assigned to the tokens with haken_1 as majority sense, Figure 90 those for haken_2, Figure 91 for haken_3, Figure 92 for haken_4 and Figure 93 for haken_5.

General distribution

The general sense distribution is quite disparate across and within batches, with a greater presence of haken_1 in batches 4 and 5, and of haken_3 in the last three batches. In 8.61% of the tokens there is no agreement; they are mostly concentrated in batches 1, 3 and 7, but the disagreement rate is in any case quite high.

A large number of tokens did not receive a majority sense among the original suggestions: 31 had no agreement at all, 54 received wrong_lemma as majority sense, 9, not_listed, and 2, unclear.

A number of the tokens without agreement could be matched to one of the original senses, but more than half were either removed, because they belonged to the wrong lemma, or were matched to the the new sense tag haken_6. All of those with wrong_lemma or unclear as majority sense and some of those with not_listed were removed because they belong to a different lemma, while the rest were linked to a different sense, mainly haken_6.

Figure 87. Distribution of senses of 'haken' per annotator and batch.

Figure 87. Distribution of senses of ‘haken’ per annotator and batch.

Figure 88. Distribution of disagreeing annotations of 'haken' per annotator and batch.

Figure 88. Distribution of disagreeing annotations of ‘haken’ per annotator and batch.

Disagreement in haken_1

The first sense covers less than 10% of some batches and between 25% and 60% of others, with a number of alternative annotations. The disagreements tend to be focused on one annotator per batch, particularly annotator 3 of batch 8 and annotator 2 from batch 6 with their preference for haken_2.

Figure 89. Sense annotations of tokens with 'haken_1' as majority sense.

Figure 89. Sense annotations of tokens with ‘haken_1’ as majority sense.

Disagreement in haken_2

The second sense covers less than 10% in some batches and between 20% and 30% in others, often with haken_1 as alternative annotation.

Figure 90. Sense annotations of tokens with 'haken_2' as majority sense.

Figure 90. Sense annotations of tokens with ‘haken_2’ as majority sense.

Disagreement in haken_3

The third sense covers less than 20% in some batches and about 30% in other; it has relatively few alternative annotations, mostly for haken_2 or geen.

Figure 91. Sense annotations of tokens with 'haken_3' as majority sense.

Figure 91. Sense annotations of tokens with ‘haken_3’ as majority sense.

Disagreement in haken_4

The fourth sense covers 0 to 12 tokens of each batch, with many cases of not_listed as alternative suggestions, manily from annotator 2 of batch 3, and a bit of other senses.

Figure 92. Sense annotations of tokens with 'haken_4' as majority sense.

Figure 92. Sense annotations of tokens with ‘haken_4’ as majority sense.

Disagreement in haken_5

This sense covers 0 to 3 tokens of each batch but has always full agreement.

Figure 93. Sense annotations of tokens with 'haken_5' as majority sense.

Figure 93. Sense annotations of tokens with ‘haken_5’ as majority sense.

Final senses

One definition was added, based on the actual occurrences of the corpus and suggestions of the annotators, so that the final senses are the ones in Table 69.

Table 69. Final definitions of ‘haken’.
code Definition
haken_1 (trans.) fix something with or as if with a hook (at, to, behind something)
haken_2 (intrans.) get stuck with or as if with a hook
haken_3 (trans.) make trip over a stuck out leg
haken_4 (intrans., with blijven ‘keep’) of thoughts, gazes and such: falter, come back (to something)
haken_5 (intrans./trans.) make handcraft by weaving loops together with a hooked needle
haken_6 (with naar ‘to’) desire, aim for

Original versus final sense distribution

Of the 360 tokens of haken, 185 kept their original majority senses, 48 were corrected to another original sense, and 109 were removed. 18 tokens were assigned a new sense.

Table 70 shows in how many tokens with each majority sense which actions were taken, and Figure 94 illustrates the frequency of the final tags. Figure 95 correlates the original majority sense and the final senses.

Figure 94. Final distribution of senses of 'haken'.

Figure 94. Final distribution of senses of ‘haken’.

Table 70. Cross-tabulation of original majority senses of ‘haken’ and actions taken.
original correct majority new remove
haken_1 30 31 0 26
haken_2 1 51 0 6
haken_3 3 65 0 2
haken_4 1 24 8 2
haken_5 0 14 0 0
no_agreement 12 0 5 14
not_listed 1 0 5 3
unclear 0 0 0 2
wrong_lemma 0 0 0 54
Figure 95. Majority and final senses of 'haken'.

Figure 95. Majority and final senses of ‘haken’.

Reliable cues

Table 71 shows the most frequent context words selected by the annotators as relevant. Table 72, Table 73, Table 74, Table 75 and Table 76 show the ranking of cues of cues according to different attributes (type, position, path and steps) for the sense tags haken_1, haken_2, haken_3, haken_4 and haken_5.

The count only considers context words chosen by at least two annotators that also assigned the final sense. Of the 360 tokens, 106 have no cues that match these criteria. 67 have one single cue and 187 have more than one (up to 9).

Across senses

Some patterns emerge from the top lemmas selected as cues from the different senses: even the least frequent one, haken_5, has clear cues; for both intransitive senses, haken_2 and haken_4, blijven co-occurs frequently, but the rest of the lemmas differ; the literal senses haken_1 and haken_2 share elkaar and to a lesser degree aan as frequent cue; and haken_3 has its own set of football related cues, such as strafschop and penalty, next to worden and pootje. There is even a profile for the new sense haken_6, albeit based on very few tokens, and for the removed tokens, which are normally cases of haken en ogen and afhaken.

Table 71. Frequency of cues by sense, counted by type.
Rank haken_1 n haken_2 n1 haken_3 n2 haken_4 n3 haken_5 n4 haken_6 n5 remove n6
1 aan/prep 18 in/prep 25 word/verb 25 blijf/verb 14 brei/verb 6 naar/prep 3 oog/verb 25
2 elkaar/pron 9 blijf/verb 23 strafschop/noun 13 oog/noun 8 naai/verb 4 dood/noun 1 en/vg 20
3 in/prep 8 elkaar/pron 17 strafschop_gebied/noun 10 aan/prep 5 en/vg 3 macht/noun 1 af/part 13
4 de/det 6 achter/prep 10 poot_DIM/noun 7 blik/noun 2 hobby/noun 2 naar/adj 1 af/adj 7
5 achter/prep 3 aan/prep 4 door/prep 6 in/prep 2 van/prep 2 roem/noun 1 met/prep 7
6 wagon_DIM/noun 3 de/det 4 penalty/noun 6 beeld/noun 1 capeje/noun 1 ruig/adj 1 af/prep 5
7 zijn/det 3 met/prep 4 bal/noun 5 bij/prep 1 en/of/vg 1 woest/adj 1 wat/det 4
8 fiets/noun 2 stuur/noun 4 foutief/adj 5 detail/noun 1 hoed_DIM/noun 1 0 los/adj 3
9 hun/det 2 van/prep 3 in/prep 4 een/det 1 houd/verb 1 0 hang/verb 2
10 trein_DIM/noun 2 het/det 2 de/det 3 ervaring/noun 1 kleed_DIM/noun 1 0 aan/prep 1

haken_1

Next to the list of types that were selected as cues, we can see that they mostly occur in the two or three closest slots to either side of the target, one or two steps away in the dependency path, mainly as locative complement (#T->ld:CW for the preposition, #T->ld:aan->obj1:CW and #T->ld:in->obj1:CW for the object) or direct object (#T->obj1:CW).

Table 72. Frequency of context words as cues of haken_1 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 aan/prep 18 R2 11 #T->ld:CW 19 1 45
2 elkaar/pron 9 L2 10 #T->obj1:CW 18 2 33
3 in/prep 8 R3 10 #T->ld:aan->obj1:CW 10 3 16
4 de/det 6 L1 8 #T->ld:in->obj1:CW 4 4 4
5 achter/prep 3 R1 8 #T->su:CW 4 6 1
6 wagon_DIM/noun 3 L3 6 #T->mod:achter->obj1:CW 2 7 1
7 zijn/det 3 R6 6 #T->mod:CW 2 0
8 fiets/noun 2 R4 5 #T->obj1:wagon_DIM->det:CW 2 0
9 hun/det 2 L5 4 CW->body:#T 2 0
10 trein_DIM/noun 2 L7 4 #T->ld:aan->obj1:die->mod:CW 1 0

haken_2

Next to the list of types that were selected as cues, we can see that they mostly occur in the three closest slots to either side of the target, one or two steps away in the dependency path, mainly as locative complement (#T->ld:CW for the preposition, #T->ld:in->obj1:CW for the object) or verb of which the target is verbal complement (CW->vc:#T, filled by blijven).

Table 73. Frequency of context words as cues of haken_2 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 in/prep 25 L1 39 #T->ld:CW 30 1 70
2 blijf/verb 23 L2 18 CW->vc:#T 23 2 50
3 elkaar/pron 17 L3 15 #T->ld:in->obj1:CW 16 3 19
4 achter/prep 10 R1 15 #T->mod:CW 10 4 8
5 aan/prep 4 R3 12 #T->ld:achter->obj1:CW 7 5 4
6 de/det 4 R2 11 #T->su:CW 7 7 2
7 met/prep 4 L4 10 #T->mod:met->obj1:CW 6 8 2
8 stuur/noun 4 L5 7 #T->ld:aan->obj1:CW 5 6 1
9 van/prep 3 R4 7 blijf->[vc:#T,su:CW] 4 NA 1
10 het/det 2 L6 5 #T->su:en->cnj:CW 3 0

haken_3

Next to the list of types that were selected as cues, we can see that they mostly occur in the first two slots to either side of the target, up to three steps away in the dependency path, mainly as verb of which the target is a verbal complement (CW->vc:#T, mostly filled by worden) or modifier of the target (#T->mod:CW, mostly door and foutief).

The 20 cues beyond the sentence correspond to 16 tokens; in all cases there are also cues within the sentence, but those without help specify the context of a football match.

Table 74. Frequency of context words as cues of haken_3 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 word/verb 25 L1 34 CW->vc:#T 29 1 62
2 strafschop/noun 13 L2 30 NA 20 2 47
3 strafschop_gebied/noun 10 R1 18 #T->mod:CW 13 3 30
4 poot_DIM/noun 7 L3 11 word->[vc:#T,su:CW] 10 NA 20
5 door/prep 6 R2 9 #T->obj1:CW 9 4 16
6 penalty/noun 6 R5 9 #T->ld:in->obj1:CW 8 5 5
7 bal/noun 5 R3 8 #T->mod:in->obj1:CW 8 6 2
8 foutief/adj 5 R6 8 #T->su:CW 8 7 2
9 in/prep 4 R4 7 ->[ROOT:#T,ROOT:sta->dp:CW] 2 10 1
10 de/det 3 L4 6 #T->ld:CW 2 11 1

haken_4

Next to the list of types that were selected as cues, we can see that they mostly occur in the closest two or three slots to the left of the target, one or matybe two steps away in the dependency path, mainly as verb of which the target is complement (CW->vc:#T, filled by blijven) but also its subject and prepositional complements.

Table 75. Frequency of context words as cues of haken_4 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 blijf/verb 14 L1 13 CW->vc:#T 14 1 26
2 oog/noun 8 L2 12 #T->ld:CW 6 2 16
3 aan/prep 5 L3 7 blijf->[vc:#T,su:CW] 6 3 5
4 blik/noun 2 R1 5 #T->su:CW 5 4 2
5 in/prep 2 L4 3 #T->ld:aan->obj1:CW 3 5 1
6 beeld/noun 1 L5 3 #T->ld:in->obj1:CW 3 0
7 bij/prep 1 R3 2 #T->mod:in->obj1:CW 2 0
8 detail/noun 1 L10 1 #T->ld:aan->obj1:brok_DIM->mod:CW 1 0
9 een/det 1 L7 1 #T->ld:aan->obj1:brok_DIM->mod:poëzie->mod:CW 1 0
10 ervaring/noun 1 L9 1 #T->ld:in->obj1:ervaring->mod:CW 1 0

haken_5

Next to the list of types that were selected as cues, we can see that they mostly occur in the two closest slots to either side of the target, up to two steps away in the dependency path, mainly as conjunct (en->[cnj:#T,cnj:CW]).

Table 76. Frequency of context words as cues of haken_5 by attribute.
Type
Position
Dependency path
Path length
Rank cw_type n position n path n steps n
1 brei/verb 6 L1 6 en->[cnj:#T,cnj:CW] 7 2 14
2 naai/verb 4 L2 6 #T->obj1:CW 3 1 9
3 en/vg 3 R2 5 CW->cnj:#T 3 3 4
4 hobby/noun 2 L4 3 #T->ld:aan->obj1:CW 1 4 1
5 van/prep 2 L8 2 #T->mod:CW 1 5 1
6 capeje/noun 1 R1 2 #T->mod:van->obj1:CW 1 NA 1
7 en/of/vg 1 L15 1 #T->su:CW 1 0
8 hoed_DIM/noun 1 L3 1 ben->[vc:#T,su:CW] 1 0
9 houd/verb 1 L6 1 ben->predc:en->[cnj:#T,su:CW] 1 0
10 kleed_DIM/noun 1 L9 1 ben->vc:en->[cnj:#T,su:CW] 1 0

Most frequent dependency paths

Figure 96 shows the 10most frequent dependency paths colored by sense tag. The passive construction prefers haken_3 or haken_4 and the direct object haken_1, while the locative complement is most frequent with haken_1 and haken_2.

Figure 96. Tokens per path.

Figure 96. Tokens per path.

Tracking lists

For the examination of the clouds, some lists were compiled with tokens that could be interesting to track. For this lemma, these include:

  • nominalizations (4 tokens, of haken_5 and haken_6);
  • garden-path tokens, as is the case of (27), of haken_1 but with a human being as object;
  • headlines (2 tokens, from haken_2);
  • special cases (2 tokens of haken_1 that actually mean “to unhook”, 3 tokens with zich as object);
  • an idiomatic expression (7 tokens of haken_1 where the object being hooked is a metaphorical wagon or similar).
  1. haak in zijn hand naar de oppervlakte werd gesleurd . " Deze gek haakte me toen hij voor marlijn ging " , aldus de duiker . "

Removed tokens

Two tokens will be removed because no clear sense could be assigned or the sense was too infrequent, two because they are duplicates of other tokens and 107 because they do not correspond to the target lemma:

  • 31 are instances of haken en ogen and further 6 of the noun haak;
  • 31 are instances of afhaken and further 7 of other separable verbs, namely inhaken, aanhaken, doorhakken and ophaken;
  • 24 are instances of vasthaken –which is a synonym of haken_1 and haken_2 and was not identified by the annotators as a separate lemma– and 6, which is an antonym and was only occasionally identified.

  1. There used to be 13, but herkennen proved to be too messy, so it goes in storage for now.

  2. In one case, there is a context word tagged by the annotators, contacten (what dat was referring to), that occurs both inside and outside the sentence. Because of a bug in the annotation tool, the first instance, outside the sentence, might have been tagged instead of the second one, and it might be the case that the annotators did not correct it after being warned of the bug.

  3. One of these tokens, had six “cues” beyond the sentence: the individual cues are not relevant per se, but the whole clause they form is part of the antecedent of dat, the object of the target:

    door het raam zouden hebben toegekeken . De schoonvader heeft dat later overigens herroepen . De raadsheren proberen de gangen van de vier te volgen om zich

  4. In one case, the annotators selected the context words “Voetballer” and “sportlaureaat”, from the previous sentence, instead of cues inside the sentence of the target:

    2001-01-18 jean eykmans Voetballer Bart Goor kanshebber voor titel Geelse sportlaureaat Volgend weekend huldigen de sportraden van Geel , Laakdal en Meerhout hun individuele sporters en sportverenigingen die zich vorig jaar onderscheidden in hun discipline .

  5. One of them and the third annotator did agree on a relevant context word, namely the head of the passive subject, but the third annotator assigned the intransitive reading herstructureren_3, so they did not agree on the sense tag.